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METHODS AND COMPOSITIONS FOR MODULATING STEM CELLS 

CROSS-REFERENCE TO RELATED APPLICATIONS 

This application claims the benefit of priority to U.S. Provisional Patent 
Application Serial No. 60/447,030 (filed February 12, 2003), the disclosure of which is 
incorporated herein by reference in its entirety and for all purposes. 

FIELD OF THE INVENTION 

The present invention generally relates to methods for enriching stem cell 
population and for modulating stem cell differentiation, as well as to therapeutic applications 
of such methods. More particularly, the invention pertains to genes differentially expressed 
in hematopoietic stem cells and to methods of using these genes to modulate stem cell 
differentiation. 

BACKGROUND OF THE INVENTION 

Hematopoiesis (hemopoiesis) is a process whereby multi-potent stem cells 
give rise to lineage-restricted progeny. The molecular basis of hematopoiesis remains poorly 
understood. Hematopoietic stem cells (HSCs) are the only cells in the hematopoietic system 
that produce other stem cells and give rise to the entire range of blood and immune system 
cells. These cells are able to self-proliferate, so as to maintain a continuous source of 
regenerative cells. When subject to particular environments and/or factors, they can 
differentiate to dedicated progenitor cells, where the dedicated progenitor cells may serve as 
the ancestor cell to a limited number of blood cell types. 

HSCs and their progenies at the various development stages all play an 
important role in the normal function of the mammalian immune system. HSCs are of 
prominent therapeutic importance in many circumstances. In many diseased states, the 
disease is a result of some defect in the maturation process. In other situations, such as 
transplantation, there is a need to prevent the immune system from rejecting the transplant 
by irradiating the host. In neoplasia, a patient may be irradiated and/or treated with 
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chemotherapeutic agents to destroy the neoplastic tissue, which often also damage or destroy 
the host immune system. Further, other situations such as a severe insult to the immune 
system also result in a substantial reduction in stem cells and injury to the immune system. 
In all these situations, it will frequently be desirable to restore stem cells to the host. For 
example, HSCs are the active component in bone marrow transplantation (BMT), and 
transplant of highly purified HSC will completely restore the hematopoietic system in a 
manner indistinguishable from unfractioned bone marrow. 

Despite decades of research, there are currently no satisfactory methods to 
expand the numbers of HSCs or accurately enumerate the numbers of expanded and 
engraftable HSCs cells following in vitro culture. There is a need in the art for better 
methods for isolating, enriching, and enumerating transplantable HSCs. The instant 
invention fulfills this and other needs. 

SUMMARY OF THE INVENTION 

In one aspect, the invention provides methods for inhibiting differentiation of 
mammalian stem cells. The methods entail (a) providing a population of stem cells, (b) 
introducing a vector comprising an HSC differentiation-inhibiting polynucleotide of the 
present invention into the stem cells, and (c) expressing a polypeptide encoded by the 
polynucleotide by culturing the modified stem cells, thereby inhibiting differentiation of the 
stem cells. In some of the methods, the stem cells are isolated from bone marrow. In some 
preferred methods, the stem cells are human hematopoietic stem cells. The human stem cells 
can be first selected for expression of CD38 and Thy prior to introduction of the vector. In 
some of the methods, the HSC differentiation-inhibiting polynucleotide encodes GATA- 
binding protein 3 or ID3. 

In a related aspect, the invention provides methods for increasing the 
effective dose of hematopoietic stem cells in a mammalian subject. The methods require (a) 
providing a population of hematopoietic stem cells, (b) introducing into the cells an HSC 
differentiation-inhibiting polynucleotide of the present invention, and c) administering the 
genetically modified cells that express an HSC differentiation-inhibiting polypeptide to a 
mammalian subject; thereby increasing the effective dose of hematopoietic stem cells in the 
subject. In some of these methods, the administered stem cells are a subpopulation of the 
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modified cells that are selected for expression of the polypeptide prior to administering to 
the subject. In some preferred methods, the subject is human, and the hematopoietic stem 
cells are human hematopoietic stem cells. In these methods, the hematopoietic stem cells 
can be selected for expression of CD34 and Thy prior to introducing into the cells the HSC 
differentiation-inhibiting polynucleotide. 

In another related aspect, the present invention provides methods for 
inhibiting hematopoietic stem cell differentiation using an HSC differentiation-inhibiting 
polypeptide identified by the present inventor. The methods entail contacting a population 
of HSCs with an effective amount of the HSC differentiation-inhibiting polypeptide which 
inhibits differentiation of the HSCs. In some of the methods, the HSCs are present in an in 
vitro cell culture. In some other methods, the HSCs are present in a subject grafted with the 
HSCs. In some preferred methods, the subject is human. 

In another aspect, the invention provides methods for isolating a population 
of cells that are enriched for hematopoietic stem cells (HSCs). These methods comprise (a) 
obtaining a sample of cells containing hematopoietic stem cells, (b) selecting cells from the 
sample based on expression or lack of expression of at least one known HSC surface marker, 
and at least one novel HSC molecule marker identified in the present invention, and (c) 
separating cells with the known HSC marker and at least one of the novel molecule markers; 
thereby isolating a population of human cells enriched for hematopoietic stem cells. 

Preferably, the hematopoietic stem cells enriched with these methods are 
human HSCs. In some methods, the known human HSC marker is CD34+ and Thy+. In 
some of the methods, the at least one novel HSC marker is a human HSC surface molecule 
identified in the present invention. 

In another aspect, the invention provides methods for enumerating 
hematopoietic stem cells in a population of cells. The methods entail (a) contacting the 
population of cells with an antibody that specifically binds to one novel HSC surface marker 
identified in the present invention under conditions that allow the antibody to specifically 
bind to the HSC surface marker, and (b) quantifying the cells recognized by the antibody; 
thereby enumerating hematopoietic stem cells in the population of cells. In some of these 
methods, the hematopoietic stem cells are human HSCs, and the population of cells are first 
selected for expression of CD34 and Thy prior to the contacting. 
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A further understanding of the nature and advantages of the present invention 
may be realized by reference to the remaining portions of the specification and claims. 

BRIEF DESCRIPTION OF THE DRAWINGS 
Figure 1 shows schematic structure of expression vectors for overexpressing 
various HSC differentiation-inhibiting genes. 

Figure 2 shows that ID3 over-expression increases the number of colony 
forming cells in CFC assay. 

Figure 3 shows upregulated expression of various transcription factors in 

mouse HSCs. 

DETAILED DESCRIPTION 

L Overview 

The present invention is predicated in part on the discovery by the present 
inventor that a number of genes are differentially expressed in hematopoietic stem cell 
populations (see Examples below). It was also found that some of these HSC genes slow 
down HSC differentiation or enhance HSC activities when they are overexpressed in HSCs. 
These genes are therefore termed HSC differentiation-inhibiting genes. 

Using HSCs enriched from blood of normal human donors, it was found that 
sequences upregulated in the human HSCs include genes encoding hormones, enzymes, 
histone, transcription factors, secreted proteins, surface markers, and other molecules. Table 
1 lists examples of these genes that are upregulated in human HSCs (CD4+Thy+) as 
compared to non stem cells (CD4+Thy-). Further, using HSCs isolated from two different 
sources, bone marrow and peripheral blood, the present inventor identified a set of genes that 
are differentially expressed in HSCs from both sources. Some of these genes are shown in 
Table 2. 

Similarly, in a mouse HSC population (CD34-CD38+), a number of genes 

o 

encoding proteins with diverse biochemical and cellular functions were also upregulated, 
including genes encoding surface antigens, transcription factors or growth factors (see 
Tables 3 and 4). These novel HSC genes are enriched in HSCs compared to their 
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differentiated progeny (e.g., CD34+ CD38+ progenitor cells) or CD34+CD38- facilitator 
cells. 

Without being bound in theory, the molecules upregulated in HSCs could 
play various functions in modulating HSC growth and differentiation, as well as regulating 
activities and functions of progenitor cells that differentiated from the HSCs. For example, 
increased levels of some of the surface receptors, growth factors, and secreted proteins 
shown in Table 2 could act in synergy in inhibiting HSC differentiation and promoting their 
expansion. 

In accordance with these discoveries, the present invention provides methods 
for modulating HSC differentiation. Inhibition of HSC differentiation allows continued 
growth and expansion of the HSC population, and therefore provide engraftable HSCs with 
increased dosage and higher potency. A number of the upregulated HSC genes identified 
herein (e.g., shown in Tables 1, 3, and 4) can potentially function as HSC differentiation- 
inhibitors. For example, polypeptides encoded by the novel HSC genes disclosed herein 
(e.g., the growth factors or hormones shown in Table 2) can be used to inhibit HSC 
differentiation in vitro (e.g., by applying to an HSC cell culture) and in vivo (e.g., by 
administering to a subject engrafted with bone marrow or HSCs). Differentiation inhibiting 
activities of these molecules were exemplified by GATA3 and ID3 as shown in the 
Examples below. 

As indicated by the GenBank accession numbers or other identification 
numbers or descriptions in Tables 1,3, and 4, sequences of the upregulated human and 
mouse HSC genes disclosed herein are all known in the art. Thus, as detailed below, the 
HSC differentiation-inhibiting polynucleotide sequences can be easily obtained 
commercially, from the sources disclosed in the public databases, or isolated using routine 
techniques of molecular biology. The encoded polypeptides can also be obtained 
commercially or easily produced with standard procedures of recombinant techniques. 

The invention also provides methods for isolating and enriching HSCs. The 
currently known HSC markers are not satisfactory because they cannot accurately predict 
homogeneity and hematopoiesis activities of cells bearing the markers. The discovery of 
genes differentially expressed in HSCs provides novel molecular markers for selecting and 
enriching HSCs. For example, antibodies against novel surface markers disclosed in the 
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present invention (e.g., those in Tables 2, 3, 4 and 5) can be used to isolate human and 
mouse HSCs from a crude population of cells (e.g., bone marrow or peripheral blood). The 
methods can also be directed to cell populations already enriched for one or more of the 
known HSCs makers (e.g., CD34+, Thy+ in human, and CD38+, c-kit+, Scal+ in mice). 
Further enrichment using these novel markers can lead to more homogeneous HSCs with 
more potent hematopoiesis activities. 

In both the autologous and allogeneic setting, the time to recover from BMT 
is directly related to the dose of HSCs transplanted. Even a modest 2 to 3-fold expansion of 
engraftable HSC would afford great benefit to patients by minimizing the duration of 
cytopenia when patients are most susceptible to infection. Thus, isolation and expansion of 
more homogeneous HSCs in vitro in accordance with the present invention would make 
autologous and allogeneic HSC transplantation safer and more effective. 

The practice of the present invention will employ, unless otherwise indicated 
conventional techniques of cell biology, molecular biology, cell culture, immunology and 
the like which are in the skill of one in the art. These techniques are fully disclosed in the art, 
e.g., in Sambrook et al., "Molecular Cloning A Laboratory Manual," Cold Springs Harbor 
Laboratory Press (3rd ed. 2001); Carter and Sweet, "Methods of Enzymology," Academic 
Press (1997); and Harlow and Lane, "Antibodies, A Laboratory Manual," Cold Spring 
Harbor Press (1998). 

The following sections provide more specific guidance for making and using 
the compositions of the invention, and for carrying out the methods of the invention. 
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Table 1. Genes upregulated in human CD34+Thy+ HSCs from peripheral blood 



Classification 


Name 


Description 


Histone 


H2BFL 


Homo sapiens H2B histone family, member A 


Histone 


H2AFA 


Human histone genes 


Histone 


H2A/1 


Homo sapiens H2A histone family, member L 


Histone 


H1F2 


Histone 2A-like protein gene 


Histone 


H2B/h 


Homo sapiens H2B histone family, member H 


Histone 


HH2A/C 


Human histone H2AFC gene 


Histone 


H2AFQ 


Homo sapiens H2A histone family, member Q 


HLA 


HLA-DPB1 


Human MHC class 11 lymphocyte antigen beta chain 


HLA 


HLA-DQB1 


Human MHC class II HLA-DR2-Dwl2 mRNA DQwl-beta 


HLA 


HLA-E 


Homo sapiens HLA-E gene 


Secreted -complement 


PTS 


Homo sapiens 6-pyruvoyltetrahydroprotein synthase 


Secreted-complement 


HFL1 


Human factor H homologue mRNA complete cds 


Secreted-growth factor 


MDK 


Homo sapiens midkine (neurite growth-promoting factor 2) 


Secreted -norm one 


OXT 


Homo sapiens oxytocin, prepro-(neurophysin I) mRNA 


Secreted-hormone 


AVP 


Homo sapiens arginine vasopressin mRNA 


Signaling-GTP 


R-Ras 


Human R-ras 


Signal ing-GTP 


GCHFR 


Hnmn Qarn'fn^ OTP pvrlnhvHrnla^p I fpfvHhnrlf rponlntorv nrotpin 

I HJJllU oaLHVllo VJ 1 I v YvlUliyUIUIaov 1 1 1 v\J UUL r*. 1 k^c^vi luLUI Y LJ1VJ4.V111 


Signaling-GTP 


GUCY1 A3 


Homo sapiens guanylate cyclase 1 , soluble, alpha 3 


Signaling-Kinase 


WAF1 


Human DNA sequence from PAC 431 A14WAF1 


Signaling- Kinase 


ITPKB 


Homo sapiens inositol 1 ,4, 5 -triphosphate 3-kinase B 


Signaling- Kinase 


PPKCL 


Homo sapiens protein kinase C, eta 


Signaling-Kinase 


PPKCZ 


Homo sapiens protein kinase C, zeta 


Signaling-SH3 


SKAP55 


Homo sapiens src kinase-associated phosphoprotein of 55kDa 


Stress 


PTGS2 


Homo sapiens prostaglandin -en doperoxide synthase 2 


Stress 


CYP2A13 


Human cytochrome P450 


Stress 


CYP2D6 


Human mRNA for cytochrome P450 dbl variant b 


Stress-apoptosis 


BCL2A1 


Homo sapiens BCL-2 -related protein 1 


Structural 


CALB1 


Homo sapiens calbindin 1 
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Structural 


Elastin 


Human elastin gene 


Structural 


ICRTI8 


Human mRN A fragment for cytokeratin 1 8 


Surface-Ig 


IGM 


Human gene for immunoglobulin mu 


Surface-Ig 


VH4 


Human IgM heavy chain variable V-D-J region (VH4) gene 


Surface-other 


APP 


Homo sapiens APP complete sequence 


Surface-receptor 


BDKRB1 


Human bradykinin B 1 receptor 


Surface-receptor 


TLR1 


Human mRNA for KIAAOO 1 2 gene 


Surface-receptor 


5T4 


Homo sapiens 5T4 oncofetal trophoblast glycoprotein 


Surface-receptor 


EFL-2 


Homo sapiens EHK1 receptor tyrosine kinase ligand 


Surface-receptor 


EV12A 


Homo sapiens ecotropic viral integration site 2A 


Surface-receptor 


FLT3 


Homo sapiens fms-related tyrosine kinase 3 


Surface-receptor 


TNFSF10 


Human tumor necrosis factor (ligand) superfamily, member 10 


Surface-receptor 


LTB 


Human lymphotoxin beta 


Surface-receptor 


CDW52 


Homo sapiens mRNA for CAM PATH- 1 


Surface-receptor 


CLECSF2 


Homo sapiens C-type lectin (activation-induced) 


Surface-unknown 


GliPR 


Human glioma pathogenesis-related protein 


Transport 


LRP 


Homo sapiens lrp mRNA 


Transcription-RUNT 


AM LI 


Human AM LI protein 


Transcription-PAR-bZIP 


TEF 


Human hepatic leukemia factor 


Transcription-FKH 


FKHR 


Homo sapiens forkhead protein 


Transcription-suppressor 


MNI 


Homo sapiens chromosome 22q 1 1 .2 MDR region 


Transcript i on -b H LH 


ID1 


Homo sapiens inhibitor of DN A binding 1 


Transcription-bH LH 


ID3 


Homo sapiens HLH 1R21 mRNA for helix-loop-helix protein 


Transcription-bHLH 


EPAS1 


Homo sapiens endothelial PAS domain protein 1 


Transcription-bH LH 


ID2 


Homo sapiens inhibitor of DNA binding 2 


Transcription-GATA 


HGATA3 


Homo sapiens GATA-binding protein 3 


Transcription-HMG 


hTcf-4 


Homo sapiens mRNA for hTCF-4 


Transcnption-HOX 


PHOXI 


Human homeobox protein 


Transcription-HOX 


MEIS1 


Homo sapiens MEIS protein 


Transcription-sl icing 


RBP-MS 


Homo sapiens RNA-binding protein gene with multiple slicing 


Transcription-Translation 


TCEA2 


Homo sapiens transcription elongation factor A 
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Unknown 


DIF2 


I EX- 1 — radiation-indue ible immediate-early gene 


Unknown 




Homo sapiens chromosome 1 7clone hRPC.906_A_24 


I InlTinum 
UIIKIIUWII 




Homo sapiens chromosome 22ql3 BAC clone CIT987SK-384D8 


Unknown 


A-362G6.1 


Human chromosome 16 BAC clone CIT987SK-A-362G6 


Unknown 


LST1 


Homo sapiens LST1 mRNA 


Unknown 


KJAA0125 


Homo sapiens K1AA0125 gene product 



Table 2. Genes Upregulated in Human HSCs from both Bone Marrow and Peripheral Blood 



Classification 


Name 


Description 


Hormone 


AVP 


Homo sapiens arginine vasopressin mRNA 


Hormone 




Corticotropin releasing hormone-binding protein 


Enzyme 


GUCYI A3 


Homo sapiens guanylate cyclase 1 , soluble, alpha 3 


Enzyme 


PPKCZ 


Homo sapiens protein kinase C, zeta 


Enzyme 




Iduronate 2-sulfatase (Hunter syndrome) 


Transcription factor 


HLF 


Human hepatic leukemia factor 


Transcription factor 


GAT A3 


Homo sapiens GATA-binding protein 3 


Transcription 


Evil 


Homo sapiens ecotropic viral integration site 1 


Transcription 


PMX1 


Paired mesoderm homeo box 1 


Transcription 


MN 1 


Meningioma (disrupted in balanced translocation) 


Secreted protein 




Tetranectin (plasminogen-binding protein) 


Secreted protein 




H factor (complement)- like 1 


Surface molecule 




Transient receptor potential channel I 


Surface molecule 


DLKl 


Delta-like homolog (Drosophila) 


Surface molecule 


EphA3 


Ephrin-A3 


Surface molecule 


TNFSF10 


Human tumor necrosis factor (ligand) superfamily, member 10 


Surface molecule 




Interferon induced transmembrane protein 


Surface molecule 




Ecotropic viral integration site 2 A 


Surface molecule 




Sortilin-related receptor, L(DLR class) A rep 


Surface molecule 




Major histocompatibility complex, class I, E 


Surface molecule 




KIAAO 1 25 gene product 
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II. Definition 

Unless defined otherwise, all technical and scientific terms used herein have 
the same meaning as commonly understood by those of ordinary skill in the art to which this 
invention pertains. The following references provide one of skill with a general definition of 
many of the terms used in this invention: Singleton et al, Dictionary of MICROBIOLOGY 
And Molecular Biology (2d ed. 1 994); The Cambridge Dictionary of Science and 
Technology (Walker ed., 1988); and Hale & Marham, The Harper Collins Dictionary 
OF BIOLOGY (1991). In addition, the following definitions are provided to assist the reader in 
the practice of the invention. 

The term "analog" is used herein to refer to a molecule that structurally 
resembles a reference molecule but which has been modified in a targeted and controlled 
manner, by replacing a specific substituent of the reference molecule with an alternate 
substituent. Compared to the reference molecule, an analog would be expected, by one 
skilled in the art, to exhibit the same, similar, or improved utility. Synthesis and screening 
of analogs, to identify variants of known compounds having improved traits (such as higher 
binding affinity for a target molecule) is an approach that is well known in pharmaceutical 
chemistry. 

As used herein, "contacting" has its normal meaning and refers to combining 
two or more agents (e.g., polypeptides or small molecule compounds) or combining agents 
and cells (e.g., a polypeptide and a cell). Contacting can occur in vitro, e.g., combining two 
or more agents or combining a test agent and a cell or a cell lysate in a test tube or other 
container. Contacting can also occur in a cell or in situ, e.g., contacting two polypeptides in 
a cell by coexpression in the cell of recombinant polynucleotides encoding the two 
polypeptides, or in a cell lysate. 

An "effective amount or dose" is an amount sufficient to effect beneficial or 
desired results. An effective amount may be administrated in one or more administrations. 
Determination of an effective amount is within the capability of those skilled in the art. 
Particularly preferred subjects of the invention in general include living mammals such as 
human, mice and rabbit, most preferred are humans. The administration of an HSC 
differentiation-inhibiting polypeptide, or a genetically modified cell comprising a 
polynucleotide sequence of the invention, may be by conventional means, for example, 
injection, oral administration, inhalation and others. Appropriate carries and diluents may be 
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included in the administration of the polypeptide or the modified cells. Samples including 
the modified cells and progeny thereof may be taken and tested to determine transduction 
efficiency. 

The term "fragment 11 when used in connection with an amino acid sequence 
means a part of a reference sequence and having at least 10 amino acid residues, preferably 
50 amino acids residues, even more preferably 100 amino acid residues and most preferably 
200 amino acid residues which are substantially identical to the reference amino acid 
sequences. Where referring to a nucleotide sequence, the term means a nucleotide sequence 
including part of the reference sequence and comprising as few as at least 30, 50, 75, 80, 100 
or more contiguous nucleotides, preferably at least 200, 300, 400, 500, 600, or more 
contiguous nucleotides, even more preferably at least 800, 1000, 1500, 2000 or more 
contiguous nucleotides that are identical to the reference sequence. 

The term "functional equivalent" when referring to a polypeptide means a 
protein having a like function and like or improved specific activity, and a similar amino 
acid sequence. In some embodiments, a functionally equivalent is a variant in which one or 
more amino acid residues are substituted with conserved or non-conserved amino acid 
residues, or one in which one or more amino acid residues includes a substituent group. 
Conservative substitutions are the replacements, one for another, among the aliphatic amino 
acids Ala, Val, Leu and He; interchange of the hydroxl residues Ser and Thr; exchange of the 
acidic residues Asp and Glu; substitution between amide residues Asn and Gin; exchange of 
the basic residues Lys and Arg; and replacements among aromatic residues Phe and Tyr. 

A "heterologous sequence" or a "heterologous nucleic acid," as used herein, 
is one that originates from a source foreign to the particular host cell, or, if from the same 
source, is modified from its original form. Thus, a heterologous gene in a host cell includes 
a gene that, although being endogenous to the particular host cell, has been modified. 
Modification of the heterologous sequence can occur, e.g., by treating the DNA with a 
restriction enzyme to generate a DNA fragment that is capable of being operably linked to 
the promoter. Techniques such as site-directed mutagenesis are also useful for modifying a 
heterologous nucleic acid. 

The term "homologous" when referring to proteins and/or protein sequences 
indicates that they are derived, naturally or artificially, from a common ancestral protein or 
protein sequence. Similarly, nucleic acids and/or nucleic acid sequences are homologous 
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when they are derived, naturally or artificially, from a common ancestral nucleic acid or 
nucleic acid sequence. Homology is generally inferred from sequence similarity between 
two or more nucleic acids or proteins (or sequences thereof). The precise percentage of 
similarity between sequences that is useful in establishing homology varies with the nucleic 
acid and protein at issue, but as little as 25% sequence similarity is routinely used to 
establish homology. Higher levels of sequence similarity, e.g., 30%, 40%, 50%, 60%, 70%, 
80%, 90%, 95% or 99% or more can also be used to establish homology. Methods for 
determining sequence similarity percentages, e.g., BLASTP and BLASTN using default 
parameters, are well known and described in the art. 

The terms "identical sequence" and "sequence identity" in the context of two 
nucleic acid sequences or amino acid sequences refer to the residues in the two sequences 
which are the same when aligned for maximum correspondence over a specified comparison 
window. A "comparison window", as used herein, refers to a segment of at least about 20 
contiguous positions, usually about 50 to about 200, more usually about 100 to about 150 in 
which a sequence may be compared to a reference sequence of the same number of 
contiguous positions after the two sequences are aligned optimally. Methods of alignment of 
sequences for comparison are well-known in the art. Optimal alignment of sequences for 
comparison may be conducted by the local homology algorithm of Smith and Waterman 
(1981) Adv. Appl. Math. 2:482; by the alignment algorithm of Needleman and Wunsch 
(1970) J. Mol. Biol. 48:443; by the search for similarity method of Pearson and Lipman 
(1988) Proc. Nat. Acad. Sci U.S.A. 85:2444; by computerized implementations of these 
algorithms (including, but not limited to CLUSTAL in the PC/Gene program by 
Intelligences, Mountain View, CA; and GAP, BESTFIT, BLAST, FASTA, or TFASTA in 
the Wisconsin Genetics Software Package, Genetics Computer Group (GCG), 575 Science 
Dr., Madison, Wis., U.S.A.). The CLUSTAL program is well described by Higgins and 
Sharp (1988) Gene 73:237-244; Higgins and Sharp (1989) CABIOS 5:151-153; Corpet et al. 
(1988) Nucleic Acids Res. 16:10881-10890; Huang et al (1992) Computer Applications in 
the Biosciences 8:155-165; and Pearson et al. (1994) Methods in Molecular Biology 24:307- 
33 1 . Alignment is also often performed by inspection and manual alignment. 

The term "isolated" means that the material is removed from its original 
environment (e.g., the natural environment if it is naturally occurring). For example, a 
naturally-occurring nucleic acid, polypeptide, or cell present in a living animal is not 
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isolated, but the same polynucleotide, polypeptide, or cell separated from some or all of the 
coexisting materials in the natural system, is isolated, even if subsequently reintroduced into 
the natural system. Such nucleic acids can be part of a vector and/or such nucleic acids or 
polypeptides could be part of a composition, and still be isolated in that such vector or 
composition is not part of its natural environment. When referring to a cell population, it 
means that homogeneous cells expressing a given set of molecular markers constitute at least 
60%, preferably 75%, more preferably 90%, and most preferably 95% of the total number of 
cells in the population. 

The terms "substantially identical" nucleic acid or amino acid sequences 
means that a nucleic acid or amino acid sequence comprises a sequence that has at least 90% 
sequence identity or more, preferably at least 95%, more preferably at least 98% and most 
preferably at least 99%, compared to a reference sequence using the programs described 
above (preferably BLAST) using standard parameters. For example, the BLASTN program 
(for nucleotide sequences) uses as defaults a wordlength (W) of 1 1, an expectation (E) of 10, 
M=5, N— 4, and a comparison of both strands. For amino acid sequences, the BLASTP 
program uses as defaults a wordlength (W) of 3, an expectation (E) of 10, and the 
BLOSUM62 scoring matrix (see Henikoff & Henikoff, Proc. Natl. Acad. Sci. USA 89:10915 
(1989)). Percentage of sequence identity is determined by comparing two optimally aligned 
sequences over a comparison window, wherein the portion of the polynucleotide sequence in 
the comparison window may comprise additions or deletions (i.e., gaps) as compared to the 
reference sequence (which does not comprise additions or deletions) for optimal alignment 
of the two sequences. The percentage is calculated by determining the number of positions 
at which the identical nucleic acid base or amino acid residue occurs in both sequences to 
yield the number of matched positions, dividing the number of matched positions by the total 
number of positions in the window of comparison and multiplying the result by 100 to yield 
the percentage of sequence identity. Preferably, the substantial identity exists over a region 
of the sequences that is at least about 50 residues in length, more preferably over a region of 
at least about 100 residues, and most preferably the sequences are substantially identical over 
at least about 1 50 residues. In a most preferred embodiment, the sequences are substantially 
identical over the entire length of the coding regions. 

The terms "nucleic acid" and "polynucleotide" refer to a deoxyribonucleotide 
or ribonucleotide polymer in either single- or double-stranded form, and unless otherwise 
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limited, encompasses known analogues of natural nucleotides that hybridize to nucleic acids 
in manner similar to naturally occurring nucleotides. A "polynucleotide sequence" is a 
nucleic acid (which is a polymer of nucleotides (A,C,T,U,G, etc. or naturally occurring or 
artificial nucleotide analogues) or a character string representing a nucleic acid, depending 
on context. Either the given nucleic acid or the complementary nucleic acid can be 
determined from any specified polynucleotide sequence. 

The term "operably linked" refers to a functional relationship between two or 
more polynucleotide (e.g., DNA) segments. Typically, it refers to the functional relationship 
of a transcriptional regulatory sequence to a transcribed sequence. For example, a promoter 
or enhancer sequence is operably linked to a coding sequence if it stimulates or modulates 
the transcription of the coding sequence in an appropriate host cell or other expression 
system. Generally, promoter transcriptional regulatory sequences that are operably linked to 
a transcribed sequence are physically contiguous to the transcribed sequence, i.e., they are 
cis-acting. However, some transcriptional regulatory sequences, such as enhancers, need not 
be physically contiguous or located in close proximity to the coding sequences whose 
transcription they enhance. A polylinker provides a convenient location for inserting coding 
sequences so the genes are operably linked to the promoter. Polylinkers are polynucleotide 
sequences that comprise a series of three or more closely spaced restriction endonuclease 
recognition sequences. 

As used herein the term "overexpression" refers to expression of a 
polypeptide brought about by genetic modification of a host cell with a nucleic acid 
sequence encoding the polypeptide. Overexpression may take place in cells normally 
lacking expression of the polypeptide (e.g., an HSC differentiation-inhibiting polypeptide). 
It can also occur in cells with endogenous expression of the polypeptide. While 
overexpression may take place in any cell type, preferred host cells for overexpressing an 
HSC differentiation-inhibiting polypeptide are hematopoietic stem cells. 

The terms "polypeptide" and "protein" are used interchangeably herein, and 
refer to a polymer of amino acid residues, e.g., as typically found in proteins in nature. A 
"mature protein" is a protein which is full-length and which, optionally, includes 
glycosylation or other modifications typical for the protein in a given cell membrane. 

A "variant" of a molecule such as an HSC differentiation-inhibiting 
polypeptide is meant to refer to a molecule substantially similar in structure and biological 
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activity to either the entire molecule, or to a fragment thereof. Thus, provided that two 
molecules possess a similar activity, they are considered variants as that term is used herein 
even if the composition or secondary, tertiary, or quaternary structure of one of the 
molecules is not identical to that found in the other, or if the sequence of amino acid residues 
is not identical. In some embodiments, a variant differs in amino acid sequence from a 
reference polypeptide by one or more substitutions, additions, deletions, truncations which 
may be present in any combination. Among preferred variants are those that vary from a 
reference polypeptide by conservative amino acid substitutions. Such substitutions are those 
that substitute a given amino acid by another amino acid of like characters. The following 
non-limiting list of amino acids are considered conservative replacements: a) alanine, serine, 
and threonine; b) glutamic acid and asparatic acid; c) asparagine and glutamine d) arginine 
and lysine; e) isoleucine, leucine, methionine and valine and f) phenylalaine, tyrosine and 
tryptophan. Most highly preferred are variants that retain the same biological function and 
activity as the reference polypeptide from which it varies. 

III. Promoting HSC Expansion by Inhibiting Differentiation 

In addition to novel markers and methods for isolating HSCs, the invention 
also provides methods for inhibiting or blocking differentiation of mammalian hematopoietic 
stem cells, thereby promoting expansion of the stem cells. A number of the novel HSC 
marker genes identified in the present invention can inhibit or block HSC differentiation. 
Examples of such differentiation-inhibiting genes are shown in Tables 1 and 2 (for human 
HSC) and Tables 3 and 4 (for mouse HSC). For example, as described in the Examples 
below, human stem cells overexpressing GATA-binding protein 3 slows differentiation of 
the cells. HSCs overexpressing ID3 increased colony forming cells, indicating enhanced 
HSC activity as compared to a control. These differentiation-inhibiting molecules can be 
used in the present invention to inhibit HSC differentiation and thereby promoting expansion 
in vitro. They can also be used in vivo to increase the effective dose of engrafted HSCs in a 
subject. 

The term HSC differentiation-inhibiting molecules (polynucleotides and the 
encoded polypeptides) include the molecules shown in Tables 1-4 that inhibit or slow HSC 
differentiation. Polynucleotides with substantial sequence identity are also encompassed. In 
addition, they also include variants, analogs, fragments, or functional derivatives of the HSC 
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differentiation-inhibiting molecules shown in Tables 1-4. These differentiation-inhibiting 
molecules can be obtained from any species. Preferably, they are from mammalian species 
including human, mouse, and chicken. The HSC differentiation-inhibiting molecules can 
also be from any source whether natural, synthetic or recombinant. 

Differentiation is defined as the restriction of the potential of a cell to self- 
renew and is normally associated with a change in the functional capacity of the cell. The 
term "inhibiting" or "blocking" differentiation is used broadly in the context of this invention 
and includes not only the prevention of differentiation but also encompasses altering or 
slowing differentiation process of a cell. Differentiation of a stem cell can be determined by 
methods well known in the art and these include analysis for surface markers associated with 
cells of a defined differentiated state. 

An HSC differentiation-inhibiting polypeptide of the present invention 
encodes an HSC differentiation-inhibiting polypeptide that blocks or slows down 
differentiation of the HSC cells (e.g., as listed in Tables 1-4). As shown in the Tables, these 
molecules include hormones, secreted proteins, or growth factors. These molecules also 
include transcription factors. One or more of these HSC differentiation-inhibiting 
polypeptides, or fragments thereof, can be applied to HSC cells in vitro, e.g., in a cell 
culture. These cells can be cultured and grown as described herein or other methods well 
known in the art. The appropriate amount of these differentiation-inhibiting polypeptides to 
be used in the cultures can be easily determined in accordance with stem cell culturing 
procedures described herein or knowledge well known in the art. By culturing the HSC in 
the presence of these molecules, differentiation of the cells can be inhibited or slowed, 
resulting in enhanced growth of engraftable HSCs. 

In addition to promoting HSC expansion in vitro, the HSC differentiation- 
inhibiting polypeptides of the invention can also be administered directly to a subject to 
promote in vivo growth of HSCs. For example, a subject engrafted with bone marrow or a 
population of HSCs can also be administered an effective amount of an HSC differentiation- 
inhibiting polypeptide or fragment thereof (e.g., the secreted proteins or growth factors 
shown in Table 1 and Tables 3-4). The polypeptide can be administered to the subject prior 
to, concurrently with, or subsequent to transplantation of the bone marrow or HSCs. 
Preferably, the polypeptide and the HSCs are administered to the subject simultaneously. 
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Other than using a differentiation-inhibiting polypeptide, inhibition of HSC 
differentiation can also be achieved using an HSC differentiation-inhibiting polynucleotide 
to genetically modify HSCs. HSC differentiation-inhibiting polynucleotides suitable for 
these methods include some of the genes upregulated in HSCs (as shown in Tables 1 and 3). 
They encode HSC differentiation-inhibiting polypeptides that block or slow down 
differentiation of the HSC cells. Some of these methods require first isolation of a 
population of hematopoietic cells, e.g., a population of CD34 + Thy + human cells or CD34" 
CD38 + mouse cells as described above, from a source of such cells. An HSC differentiation- 
inhibiting polynucleotide of the invention can then be introduced into the cells whereby the 
cells are genetically modified. 

Once the cells are genetically modified, they are cultured in the presence of at 
least one cytokine in an amount sufficient to support growth of the modified cells. The 
modified cells are then selected wherein the encoded polypeptide is overexpressed and 
differentiation is blocked. The genetically modified cells thus obtained may be used 
immediately (e.g., in transplant), cultured and expanded in vitro, or stored for later uses. The 
modified HSCs may be stored by methods well known in the art, e.g., frozen in liquid 
nitrogen. 

Genetic modification as used herein encompasses any genetic modification 
method of introduction of an exogenous or foreign gene into mammalian cells (particularly 
human stem cell and hematopoietic cells). The term includes but is not limited to 
transduction (viral mediated transfer of host DNA from a host or donor to a recipient, either 
in vitro or in vivo), transfection (transformation of cells with isolated viral DNA genomes), 
liposome mediated transfer, electroporation, calcium phosphate transfection or 
coprecipitation and others. Methods of transduction include direct co-culture of cells with 
producer cells (Bregni et al., Blood 80:1418-1422, 1992) or culturing with viral supernatant 
alone with or without appropriate growth factors and polycations (Xu et al., Exp. Hemat. 
22:223-230, 1994). 

Various in vitro and in vivo assays are well known in the art for the 
measurement of the functional compositions of hematopoietic cell populations. See, e.g., 
Quesenberry et al. eds., Stem Cell Biology and Gene Therapy, Wiley- Liss Inc. 1998— 
Chapter 5, Hematopoietic Stem cells: Proliferation, Purification and Clinical Applications, 
pgs 133-160. Other examples of suitable assays are also known in the art. For example, the 
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long term culture-initiating cell (LTCIC) assay involves culturing a cell population on 
stromal cell monolayers for approximately 5 weeks and then testing in a 2 week semisolid 
media culture for the frequency of clonogenic cells retained (Sutherland et aL, Blood 
74: 1563 (1989)). The Colony Forming Cells (CFC) assay or Colony-Forming Unit Culture 
(CFUC) assay involves use of cell count as the number of colony- forming units per unit 
volume or area of a sample. The assay is used to measure clonal growth of quickly maturing 
progenitors in semi-solid media supplemented with serum and growth factors. Depending 
on the growth factors used to stimulate growth mature and/or primitive progenitors may be 
determined. Cobblestone area forming colony (CAFC) assays measure clonal proliferation 
of long-lived progenitors supported by stromal cell monolayers and growth factor/serum 
supplemented media. On the appropriate stromal monolayers, cells pluripotent for myeloid 
and lymphoid lineages maybe determined. (Young et al., Blood 88:1619, 1996). SCID-hu 
bone assays measure the proliferation and multilineage differentiation of cells with bone 
marrow repopulating activity. These cells are likely to contribute to durable engraftment in 
clinical transplantation. SCID-hu thymus assays measure the proliferation and differentiation 
in thymocytes. Both bone marrow repopulating and more mature T-lineage progenitors may 
be measured. 

A polynucleotide encoding an HSC differentiation-inhibiting molecule is 
typically introduced to a host cell in a vector. The vector typically includes the necessary 
elements for the transcription and translation of the inserted coding sequence. Methods used 
to construct such vectors are well known in the art. For example, techniques for constructing 
suitable expression vectors are described in detail in Sambrook et al., Molecular Cloning: A 
Laboratory Manual, Cold Spring Harbor Press, N.Y. (3 rd Ed., 2000); and Ausubel et al., 
Current Protocols in Molecular Biology, John Wiley & Sons, Inc., New York (1999). 

Vectors may include but are not limited to viral vectors, such as baculovirus, 
retroviruses, adenoviruses, adeno-associated viruses, and herpes simplex viruses; 
bacteriophages; cosmids; plasmid vectors; synthetic vectors; and other recombination 
vehicles typically used in the art. Vectors containing both a promoter and a cloning site into 
which a polynucleotide can be operatively linked are well known in the art. Such vectors are 
capable of transcribing RNA in vitro or in vivo, and are commercially available from sources 
such as Stratagene (La Jolla, Calif.) and Promega Biotech (Madison, Wis.). Specific 
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examples include, pSG, pSV2CAT, pXtl from Stratagene; and pMSG, pSVL, pBPV and 
pSVK3 from Pharmacia. 

Preferred vectors include retroviral vectors (see, Coffin et al., "Retroviruses", 
Chapter 9 pp; 437-473, Cold Springs Harbor Laboratory Press, 1997). Vectors useful in the 
invention can be produced recombinantly by procedures well known in the art. For example, 
W094/29438, W097/21824 and W097/21825 describe the construction of retroviral 
packaging plasmids and packing cell lines. Exemplary vectors include the pCMV 
mammalian expression vectors, such as pCMV6b and pCMV6c (Chiron Corp.), pSFFV- 
Neo, and pBluescript-Sk+. Non-limiting examples of useful retroviral vectors are those 
derived from murine, avian or primate retroviruses. Common retroviral vectors include 
those based on the Moloney murine leukemia virus (MoMLV-vector). Other MoMLV 
derived vectors include, Lmily, LINGFER, MINGFR and MINT (Chang et al., Blood 92:1- 
11, 1998). Additional vectors include those based on Gibbon ape leukemia virus (GALV) 
and Moloney murine sacroma virus (MoMSV) and spleen focus forming virus (SFFV). 
Vectors derived from the murine stem cell virus (MESV) include MESV-MiLy (Agarwal et 
al., J. of Virology, 72:3720-3728, 1998). Retroviral vectors also include vectors based on 
lentiviruses, and non-limiting examples include vectors based on human immunodeficiency 
virus (HIV-1 and HIV-2). 

In producing retroviral vector constructs, the viral gag, pol and env sequences 
can be removed from the virus, creating room for insertion of foreign DNA sequences. 
Genes encoded by foreign DNA are usually expressed under the control a strong viral 
promoter in the long terminal repeat (LTR). Selection of appropriate control regulatory 
sequences is dependent on the host cell used and selection is within the skill of one in the art. 
Numerous promoters are known in addition to the promoter of the LTR. Non-limiting 
examples include the phage lambda PL promoter, the human cytomegalovirus (CMV) 
immediate early promoter; the U3 region promoter of the Moloney Murine Sarcoma Virus 
(MMSV), Rous Sacroma Virus (RSV), or Spleen Focus Forming Virus (SFFV); Granzyme 
A promoter; Granzyme B promoter, CD34 promoter; and the CD8 promoter. Additionally 
inducible or multiple control elements may be used. 

Such a construct can be packed into viral particles efficiently if the gag, pol 
and env functions are provided in trans by a packing cell line. Therefore, when the vector 
construct is introduced into the packaging cell, the gag-pol and env proteins produced by the 
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cell, assemble with the vector RNA to produce infectious virons that are secreted into the 
culture medium. The virus thus produced can infect and integrate into the DNA of the target 
cell, but does not produce infectious viral particles since it is lacking essential packaging 
sequences. Most of the packing cell lines currently in use have been transfected with 
separate plasmids, each containing one of the necessary coding sequences, so that multiple 
recombination events are necessary before a replication competent virus can be produced. 
Alternatively the packaging cell line harbors a provirus. The provirus has been crippled so 
that although it may produce all the proteins required to assemble infectious viruses, its own 
RNA cannot be packaged into virus. RNA produced from the recombinant virus is packaged 
instead. Therefore, the virus stock released from the packaging cells contains only 
recombinant virus. Non-limiting examples of retroviral packaging lines include PA 12, 
PA317, PE501, PG13, PSI.CRIP, RD1 14, GP7CMTA-G10, ProPak-A (PPA-6), and PT67. 
Reference is made to Miller et al., Mol. Cell Biol. 6:2895, 1986; Miller et al., Biotechniques 
7:980, 1989; Danos et al., Proc. Natl. Acad. Sci. USA 85:6460, 1988; Pear et al., Proc. Natl. 
Acad. Sci. USA 90:8392-8396, 1993; and Finer et al., Blood 83:43-50, 1994. 

Other suitable vectors include adenoviral vectors (see, Frey et al., Blood 
91:2781, 1998; and WO 95/27071) and adeno-associated viral vectors. These vectors are all 
well know in the art, e.g., as described in Chatterjee et al., Current Topics in Microbiol. And 
Immunol., 218:61-73, 1996; Stem cell Biology and Gene Therapy, eds. Quesenberry et al., 
John Wiley & Sons, 1998; and U.S. Pat. Nos. 5,693,531 and 5,691,176. The use of 
adenovirus-derived vectors may be advantageous under certain situation because they are not 
capable of infecting non-dividing cells. Unlike retroviral DNA, the adenoviral DNA is not 
integrated into the genome of the target cell. Further, the capacity to carry foreign DNA is 
much larger in adenoviral vectors than retroviral vectors. The adeno-associated viral vectors 
are another useful delivery system. The DNA of this virus may be integrated into non- 
dividing cells, and a number of polynucleotides have been successful introduced into 
different cell types using adeno-associated viral vectors. 

In some embodiments, the construct or vector will include two or more 
heterologous polynucleotide sequences; a) the nucleic acid sequence encoding an HSC 
differentiation-inhibiting polypeptide of the invention, and b) one or more additional nucleic 
acid sequence. Preferably the additional nucleic acid sequence is a polynucleotide which 
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encodes a selective marker, a structural gene, a therapeutic gene, a ribozyme, or an antisense 
sequence. 

A selective marker may be included in the construct or vector for the 
purposes of monitoring successful genetic modification and for selection of cells into which 
DNA has been integrated. Non-limiting examples include drug resistance markers, such as 
G 148 or hygromycin. Additionally negative selection may be used, for example wherein the 
marker is the HSV-tk gene. This gene will make the cells sensitive to agents such as 
acyclovir and gancyclovir. Selection may also be made by using a cell surface marker, for 
example, to select overexpression of an HSC differentiation-inhibiting polypeptide by 
fluorescence activated cell sorting (FACS). The NeoR (neomycin/G148 resistance) gene is 
commonly used but any convenient marker gene may be used whose gene sequences are not 
already present in the target cell can be used. Further non-limiting examples include low- 
affinity Nerve Growth Factor (NGFR), enhanced fluorescent green protein (EFGP), 
dihydrofolate reductase gene (DHFR) the bacterial hisD gene, murine CD24 (HSA), murine 
CD8a(lyt), bacterial genes which confer resistance to puromycin or phleomycin, and beta.- 
glactosidase. 

The additional polynucleotide sequence(s) may be introduced into the host 
cell on the same vector as the polynucleotide sequence encoding the polypeptides of the 
invention or the additional polynucleotide sequence may be introduced into the host cells on 
a second vector. In a preferred embodiment, a selective marker will be included on the same 
vector as the HSC differentiation-inhibiting polynucleotide. 

Typically, the host cells for expressing the HSC differentiation-inhibiting 
polynucleotide are mammalian stem cells, e.g., HSCs from humans, mice, monkeys, farm 
animals, sport animals, pets, and other laboratory rodents and animals. These cells can be 
obtained, cultured, and manipulated as described above and in Potten C. S. ed., Stem Cells, 
Academic Press, 1997; Stem Cell Biology and Gene Therapy, eds. Quesenberry et al., John 
Wiley & Sons Inc., 1998; and Gage et al., Ann. Rev. Neurosci. 18:159-192, 1995. 

IV. Novel Molecular Markers for Isolating and Enriching HSCs 

As detailed in the Examples below, the present inventor identified a number 
of genes that are differentially expressed in human and mouse HSCs. These genes, which 
can play a role in regulating hematopoiesis as well as activities of HSCs and progenitor cells, 
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are suitable as markers for selecting and enriching HSCs from diverse populations of cells. 
As exemplified in Tables 1-4, these HSC markers include transmembrane proteins (e.g., 
receptors), growth factor, transcription factors, as well as other proteins with diverse cellular 
and biochemical functions. 

Employing these novel HSC markers, the present invention provides methods 
for isolating stem cells from any vertebrate, particularly mammalian, species. In general, 
one or more of the novel markers can be targeted in the methods. Selection with these 
markers can be performed alone with a crude population of cells (e.g., bone marrow). The 
selection scheme can also be used in combination with other selection and purification 
procedures, e.g., to further select HSCs from cells already enriched for other known HSC 
surface markers. 

In some embodiments, the novel markers for selecting and enriching HSCs 
are cell surface markers. As described in the Examples, a number of the genes upregulated 
in the human and mouse HSCs encode transmembrane proteins (see also Tables 2 and 7). 
These proteins provide novel surface markers for isolating HSCs from or enumerating HSCs 
in a population of diverse cells (e.g., bone marrow). These methods are useful for isolating 
stem cells from primates, e.g. human, monkeys, gorillas, domestic animals, bovine, equine, 
ovine, porcine, and etc. Isolation of HSCs bearing these novel markers can be performed 
with the same procedures disclosed herein for the other phenotypic markers. 

In some embodiments, selection of the novel HSC markers utilizes antibodies 
that recognize the novel HSC markers. This includes preparing an antibody to a novel HSC 
marker (e.g., a surface marker) of the invention and purifying the antibody. By exposing a 
population of hematopoietic cells or crude cells to the antibody and allowing the exposed 
cells to bind with the antibody, cells bearing the novel HSC marker can be isolated. 
Techniques including antibody preparation and purification are well known and routinely 
practiced in the art. See, e.g., Harlow and Lane, Antibodies: A Laboratory Manual, Cold 
Spring Harbor Press (1998). Such antibodies encompass any antibody or fragment thereof 
either native or recombinant, synthetic or naturally derived, which retains sufficient 
specificity to bind specifically to an HSC marker. They may be monoclonal or polyclonal, 
and can be produced using the novel HSC marker protein or a fragment or variant thereof. 
In addition, antibodies that recognize some of these marker proteins may also be obtained 
commercially. 
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When combined with other selection procedures, the particular order by 
which hematopoietic cells are separated from other cells is not critical to this invention. 
When a genetically modified HSC cell is to be selected (as detailed above), the specific cell 
types may be separated either prior to genetic modification or after genetic modification. In 
some methods, crude cell samples are initially separated by markers indicating unwanted 
cells, then with a negative selection, followed by separations for markers or marker levels 
indicating that the cells belong to the stem cell population, and finally positive selection with 
novel markers of the present invention. In some other methods, following the initial crude 
separation, the cells can be directly subject to enrichment for at least one of the novel HSC 
markers. 

For example, an initial crude cell population can be first purified to remove 
major cell families from the bone marrow or other hematopoietic cell source. A negative 
selection can then be carried out by targeting some of the cell surface antigens (e.g., Lin, 
CD34 for mouse HSCs). A further positive selection can be performed to isolate a cell 
population with specific stem cell markers (e.g., CD34 and Thy for human HSC, and c-kit, 
Sca-1, or CD38 for mouse HSC). Thereafter, additional selections can be carried out using 
one or more of the novel HSC surface markers disclosed herein. 

The starting cell populations for selecting and enriching HSC can be obtained 
from bone marrow or other hematopoietic source. Stem cells and progenitor cells from bone 
marrow constitute only a small percentage (e.g., about 0.01 to about 0.1%) of the bone 
marrow cells. Bone marrow cells may be obtained from a source of bone marrow, e.g. 
tibiae, femora, spine, fetal liver, and other bone cavities. Other sources of hematopoietic 
stem cells include embryonic yolk sac, fetal live, fetal and adult spleen, and blood including 
adult peripheral blood and umbilical cord blood (To et al., Blood 89:2233-2258, 1997). 

Procedures for isolation of bone marrow are well known in the art. For 
example, an appropriate solution may be used to flush the bone. For example, the solution 
can be a balanced salt solution conveniently supplemented with fetal calf serum or other 
naturally occurring factors. These components can be present in conjunction with an 
acceptable buffer at low concentration, generally from about 5 to 25 mM. Convenient 
buffers include but are not limited to HEPES, phosphate and lactate buffers. Bone marrow 
can also be aspirated from the bone in accordance with other conventional techniques well 
known in the art. 
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As indicated above, to isolate the HSC cells, a relatively crude separation can 
be initially used to remove major cell families from the bone marrow or other hematopoietic 
cell source. Various techniques may be employed to separate the cells to initially remove 
cells of dedicated lineage. These include physical separation, magnetic separation using 
antibody-coated magnetic beads, affinity chromatography, and cytotoxic agents joined to a 
monoclonal antibody or used in conjunction with a monoclonal antibody. Also included is 
the use of fluorescence activated cell sorters (FACS) wherein the cells can be separated on 
the basis of the level of staining of the particular antigens. These techniques are well known 
to those of ordinary skill in the art and are described in various references including U.S. Pat. 
Nos. 5,061,620; 5,409,8213; 5,677,136; and 5,750,397; and Yau et al., Exp. Hematol. 
18:219-222, 1990). 

Monoclonal antibodies are particularly useful for this initial separation 
procedure. The antibodies may be attached to a solid support to allow for separation. In 
some methods, magnetic bead separations are used to attach the antibodies. Conjugating the 
antibodies with markers such as magnetic beads, e.g., using biotin-avidin link, allows for 
direct separation of bound cells from the unbound cells. Antibodies (e.g., monoclonal 
antibodies) directed to the various surface markers of these differentiated cells can be 
obtained commercially or prepared using methods routinely practiced in the art. 

To select HSCs, this initial separation allows removal of large numbers of 
cells of the hematopoietic system of various lineages, such as thymocytes, T-cells, pre-B 
cells, B-cells, granulocytes, myelomonocytic cells, and platelets. Cells that can be separated 
in this stage also include other minor cell populations, e.g., megakaryocytes, mast cells, 
eosinophils and basophils. Generally, at least about 70%, usually 80% or more of the total 
hematopoietic cells will be removed. Since there will be positive selection at the later 
selection steps, it is not essential to remove at the initial stage every dedicated cell class, 
such as the minor population members, the platelets, and erythrocytes. However, it is 
preferable that there be positive selection for all of the cell lineages, so that in the final 
positive selection the number of dedicated cells present is minimized. 

Phenotypes of surface antigen of the dedicated lineage cells are known in the 
art. For example, CD34 is expressed on most immature T-cells also called thymocytes, and 
these cells lack cell surface expression of CD1, CD2, CD3, CD4, and CD8 antigens. 
CD45RA is a useful T-cell marker. The best known T-cell marker is the T-cell receptor 



24 



(TCR). There are presently two defined types of TCRs, TCR-2 (consisting of a and (3 
polypeptides) and TCR-1 (consisting of 8 and y polypeptides). B cells may be selected, for 
example, by expression of CD 19 and CD20. Myeloid cells may be selected, for example, by 
expression of CD 14, CD1 5, and CD 16. NK cells may be selected based on expression of 
CD56 and CD 16. Erythrocytes may be identified by expression of glycophorin A. 
Compositions enriched for progenitor cells capable of differentiation into myeloid cells, 
dendritic cells, or lymphoid cells also include the phenotypes CD45RA + CD34 + Thyl + and 
CD45RA + CD10 + Lin" CD34 + . Other useful markers for various cell types are also known in 
the art. 

The separation techniques employed should maximize the retention of 
viability of the fraction to be collected. For the initial separations, various techniques of 
differing efficacy may be employed. The particular technique employed will depend upon 
efficiency of separation, cytotoxicity of the methodology, ease and speed of performance, 
and necessity for sophisticated equipment and/or technical skill. Procedures for separation 
may include magnetic separation, using antibody-coated magnetic beads, affinity 
chromatography, cytotoxic agents joined to a monoclonal antibody or used in conjunction 
with a monoclonal antibody, e.g. complement and cyto toxins, and "panning" with antibody 
attached to a solid matrix, e.g. plate. Techniques providing accurate separation include 
fluorescence activated cell sorters, which can have varying degrees of sophistication, e.g. a 
plurality of color channels, low angle and obtuse light scattering detecting channels, and 
impedance channels. 

Following the initial coarse selection, positive and/or negative selection using 
various other known stem cell markers as well as the novel HSC markers disclosed herein 
can be followed. In some methods, human HSCs are isolated using markers such as CD34 + 
and Thy + as discussed in the Examples below. In some methods, human HSCs are selected 
for a phenotype of CD34 + Thyl + Lin". Other examples of enriched phenotypes include: 
CD2", CD3", CD4", CD8", CD 10', CD14", CD15", CD19", CD20", CD33", CD34", CD38 lo/ ", 
CD45RA , CD 59 +/ ", CD71", CDW109 + , glycophorin", AC133 + , HLA"DR +/ ", c-kit + , and EM*. 
Lin" refers to a cell population selected on the basis of lack of expression of at least one 
lineage specific marker, for example CD2, CD3, CD14, and CD56. The combination of 
expression markers used to isolate and define an enriched HSC population may vary 
depending on various factors and may vary as other expression markers become available. 
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Similarly, mouse HSCs can be selected for one or more of the known markers 
such as Lin', c-kit + , Sca-1 + , CD38 + , and CD34" (see Example 3). In other methods, murine 
HSCs with similar properties to the human CD34 + Thy-1 + Lin" may be identified by kit* 
Thy-1.1 10 Lin /k) Sca-1 + (KTLS). Other phenotypes are well known, e.g., as described in US 
Patent No. 6,451,558. When CD34 expression is combined with selection for Thy-1, a 
composition comprising approximately fewer than 5% lineage committed cells can be 
isolated (U.S. Pat. No. 5,061,620). 

Once the cells are harvested and optionally separated, the cells are cultured in 
a suitable medium comprising a combination of growth factors that are sufficient to maintain 
growth. The term culturing refers to the propagation of cells on or in media of various kinds. 
It is understood that the descendants of a cell grown in culture may not be completely 
identical (either morphologically, genetically or phenotypically) to the parent cell. Methods 
for culturing stem cells and hematopoietic cells are well known to those skilled in the art. 
Any suitable culture container may be used, and these are readily available from commercial 
vendors. The seeding level is not critical, and it will depend on the type of cells used. In 
general, the seeding level will be at least 10 cells per ml, more usually at least about 100 
cells per ml and generally not more than 10 6 cells per ml. 

Various culture media can be used and non-limiting examples include 
Iscove's modified Dulbecco's medium (IMDM), X-vivo 15 and RPMI-1640. These are 
commercially available from various vendors. The formulations may be supplemented with 
a variety of different nutrients, growth factors, such as cytokines and the like. In general, the 
term cytokine refers to any one of the numerous factors that exert a variety of effects on 
cells, such as inducing growth and proliferation. The cytokines may be human in origin or 
may be derived from other species when active on the cells of interest. Included within the 
scope of the definition are molecules having similar biological activity to wild type or 
purified cytokines, for example produced by recombinant means, and molecules which bind 
to a cytokine factor receptor and which elicit a similar cellular response as the native 
cytokine factor. 

The medium can be serum free or supplemented with suitable amounts of 
serum such as fetal calf serum, autologous serum or plasma. If cells or cellular products are 
to be used in humans, the medium will preferably be serum free or supplemented with 



26 



autologous serum or plasma (see, e.g., Lansdorp et al., J. Exp. Med. 175:1501, 1992; and 
Petzer et al., PNAS 93:1470, 1996). 

Examples of compounds that can be used to supplement the culture medium 
are thrombopoietin (TPO), Flt3 ligand (FL), c-kit ligand (KL, also known as stem cell factor, 
SCF, or Stl), Interleukin (e.g., IL-1, IL-2, IL-3, IL-6, soluble IL-6 receptor, IL-1 1, and IL- 
1 2), granulocyte-colony stimulating factor (G-CSF), granulocyte macrophage-colony 
stimulating factor (GM-CSF), leukemia inhibitory factor (LIF), MIP-loc, and erythropoietin 
(EPO). These compounds may be used alone or in any combination. When murine stem 
cells are cultured, a preferred non-limiting medium includes mIL-3, mIL-6 and mSCF. 

Concentration range of these compounds to be used in cultures can be 
determined according to knowledge well known in the art. For example, a general preferred 
range of TPO is from about 0.1 ng/mL to about 5000 |ig/mL, more preferred is from about 
1.0 ng/mL to about 1000 ng/mL, even more preferred from about 5.0 ng/mL to about 300 
ng/mL. A preferred concentration range for each of FL and KL is from about 0.1 ng/mL to 
about 1 000 ng/mL, more preferred is from about 1 .0 ng/mL to about 500 ng/mL. IL-6 is a 
preferred factor to be included in the culture, and a preferred concentration range is from 
about 0.1 ng/mL to about 500 ng/mL, and more preferred from about 1 .0 ng/mL to about 100 
ng/mL. Hyper IL-6, a covalent complex of IL-6 and IL-6 receptor may also be used in the 
culture. 

Other molecules can also be added to the culture media, for instance, 
adhesion molecules, such as fibronection or RetroNectin™ (commercially produced by 
Takara Shuzo Co., Otsu Shigi, Japan). Fibronectin is a glycoprotein that is found throughout 
the body, and its concentration is particularly high in connective tissues where it forms a 
complex with collagen. 

V. Therapeutic Applications 

HSC's are the active component in bone marrow transplantation (BMT). The 
use of purified HSCs transplant as opposed to bone marrow provides the advantage that 
transplant of harmful non-HSC cells in the bone marrow is avoided. In the autologous 
cancer or autoimmune setting, the use of purified HSCs minimizes the possibility of giving 
tumor or diseased cells back to the patient along with the bone marrow. In allogenic 
transplantion, using high doses of HSCs overcomes rejection by the recipient immune 
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system. Thus, expansion of HSCs would make autologous and allogeneic HSC 
transplantation safer and more effective. 

The present invention provides methods for inhibiting HSC differentiation 
and promoting HSC expansion in vivo in a subject, e.g., a human subject engrafted with 
HSCs. Using HSC differentiation-inhibiting molecules identified in the present invention, 
these methods allow expansion of non-differentiated stem cells and increase the dose of 
HSCs either ex vivo or in vivo, thereby potentially allowing more rapid engraftment. The 
HSC differentiation-inhibiting molecules can be expressed in the engrafted HSCs. It can 
also be separately provided to the subject receiving the HSC graft, e.g., expressed from a 
vector introduced into the subject. In addition, the HSC differentiation-inhibiting molecules 
can also be administered to the subject as an expressed polypeptide, e.g., a growth factor. As 
a result, differentiation of the cells is blocked or slowed down, resulting in expansion of non- 
differentiated stem cells. 

Some methods of the invention provide ex vivo gene therapy for transplanting 
genetically modified HSCs cells into a subject. For example, vectors expressing an HSC 
differentiation-inhibiting polypeptide can be delivered to HSCs explanted from an individual 
subject, followed by reimplantation of the cells into a subject, usually after selection for cells 
that have incorporated the vector. Procedures for modifying host cells with an HSC 
differentiation-inhibiting polynucleotide (e.g., GATA3) are described above. In addition, ex 
vivo cell transfection for diagnostics, research, or for gene therapy {e.g., via re-infusion of 
the transfected cells into the host organism) is well known in the art. For a review of gene 
therapy procedures, see Anderson, Science 256: 808-813, 1992; Nabel & Feigner, TIBTECH 
11:211-217, 1993;Mitani&Caskey, TIBTECH 11: 162-166, 1993; Mulligan, Science 260: 
926-932, 1993; Dillon, TIBTECH 11: 167-175, 1993; Miller, Nature 357: 455-460, 1992; 
Van Brunt, Biotechnology 6: 1 149-1 154, 1998; Vigne, Restorative Neurology and 
Neuroscience 8: 35-36, 1995; Kremer & Perricaudet, British Medical Bulletin 51: 31-44, 
1995; Haddada et ai, in Current Topics in Microbiology and Immunology (Doerfler & 
Bohm eds., 1995); and Yu etal. t Gene Therapy 1: 13-26, 1994). 

For therapeutic applications, the genetically modified HSC cells are 
maintained for a period of time sufficient for overexpression of HSC differentiation- 
inhibiting polypeptide. A suitable time period will depend inter alia upon cell type used and 
is readily determined by one skilled in the art. In general, genetically modified cells of the 
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invention may overexpress HSC differentiation-inhibiting polypeptide for the lifetime of the 
host cell. Preferably, for hematopoietic cells the time period will be in the range of 1 to 45 
days, more preferably in the range of 1 to 30 days, even more preferably in the range of 1 to 
20 days, still more preferably in the range of 1 to 10 days, and most preferably in the range 
of 1 to 5 days. 

Other than ex vivo gene therapy, vectors expressing an HSC differentiation- 
inhibiting polypeptide can also be delivered in vivo. This is carried out by administering to 
an individual subject the expression vector, typically by systemic administration (e.g., 
intravenous, intraperitoneal, intramuscular, subdermal, or intracranial infusion) or topical 
application. Methods for in vivo gene therapy are also well known in the art, e.g., as 
described in the literatures noted above. 

As described above, other than gene therapy, therapeutic expansion of HSCs 
in a subject can also be achieved by directly applying an HSC differentiation-inhibiting 
polypeptide (or its fragment or functional derivative) to a subject. The subject can be 
simultaneously engrafted with HSCs. The subject can also be one that has not been subject 
to HSC transplant. Typically, in such applications, the HSC differentiation-inhibiting 
polypeptide (e.g., GATA3) is administered to the subject in a pharmaceutical composition. 
The pharmaceutical compositions typically comprise at least one active ingredient together 
with one or more acceptable carriers thereof. Suitable carriers for preparing the 
pharmaceutical compositions, appropriate dosages, and suitable routes of administration of 
the compositions can all be readily determined by following methods well known in the art. 
See, e.g., Gilman et al., eds., Goodman and Gilman's: The Pharmacological Bases of 
Therapeutics , 8th ed., Pergamon Press, 1990; Remington: The Science and Practice of 
Pharmacy, Mack Publishing Co., 20 th ed., 2000; Avis et al., eds., Pharmaceutical Dosage 
Forms: Parenteral Medications, published by Marcel Dekker, Inc., N.Y., 1993; and 
Lieberman et al., eds., Pharmaceutical Dosage Forms: Tablets, published by Marcel Dekker, 
Inc.,N.Y., 1990. 

EXAMPLES 

The following examples are provided to illustrate, but not to limit the present 

invention. 
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Example 1 . Genes Upregulated in Human HSCs 

This Example describes RNA profiling of human hematopoietic stem cells 
and characterization of genes upregulated in the HSCs. All procedures and assays employed 
herein to study the human HSCs have been described in the art, e.g., as noted above. 

CD34 + cells were first isolated from blood of six normal human donors using 
magnetic beads. Flow activated cell sorting (FACS) was then used to purify CD34 + Thy + 
(stem enriched) and CD34 + Thy" (stem depleted) cell populations. The two populations of 
cells (total 12 samples, 6 CD34 + Thy + and 6 CD34 + Thy~) were assayed for bioactivity with 
the CFC assay. RNA profiling (Thy + vs Thy") was then carried out to identify genes 
differentially expressed in stem cells. Results of the profiling are shown in Table 1 . The 
data indicate that the upregulated genes encode proteins with diverse biochemical and 
cellular functions. 

In addition, genes upregulated in CD34 + Thy + HSCs from two different 
sources, bone marrow and peripheral blood, were compared for overlapping sequences that 
are enriched in HSCs from both sources. A total of 30 genes were found to have been 
upregulated in HSCs from both sources. An exemplary list of these genes is shown in Table 
2. Both HSC types contain transcription factors some of which are known proto-oncogenes 
(e.g., GATA3, HLF, Evil, PMX1, MN1, ATF3). 

Further, the results indicate that HSCs from peripheral blood, but not HSCs 
from bone marrow, are enriched in histones and inhibitory HLH transcription factors (ID1, 
ID2, and ID3). The data also suggest new cell surface markers for HSCs. Examples include 
5T4, EphA3, TNFSF3, EVI2b, DLK1 . Several potential neuropeptides are also upregulated, 
including Vasopression (A VP), Oxytocin (OXT), and Vasodilators. 

Example 2. Inhition of HSC Differentiation By Overexpressing an HSC Differentiation- 
Inhibiting Polypeptide 

The Example describes effects on HSC differentiation by constitutive 
expression of an HSC differentiation-inhibiting gene in CD34+Thy+ cells using retroviral 
vectors. First, effect of overexpressing ID3 was analyzed with colony- forming cell (CFC) 
assay. Other assays such as cobblestone area forming cell (CAFC) assay and NOD/SCID 
(nonobese diabetic mice with severe combined immunodeficiency disease) repopulating cell 
assay can also be used in these analyses. These assays can be performed as described as 
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described above and are well known in the art (e.g., Kusadasi et al., Leukemia 14: 1944-53, 
2000; and Larochelle et al., Nature Medicine, 2: 1329-1337, 1996). 

Fig. 1 illustrates the schematic structure of the retroviral vectors used in the 
study. Gene X in the figure denotes any of these HSC genes (e.g., ID3) to be examined. 
The vectors also express the green-fluorescence protein (GFP). When the GFP gene is 
transfected into or infected cells, the encoded GFP shines green under ultraviolet light and 
thus enables the detection of the transfected or infected cell in a simple manner. 

A vector harboring the HSC gene (e.g., ID3 or GATA3) was transfected into 
the CD34 + cells. Cells expressing the gene were sorted and assayed with the CFC assay. As 
shown in Fig. 2, ID3 over-expression increased the number of colony forming cells (e.g., 
primitive BFU-E colonies). This suggests enhanced HSC activity, indicating that 
differentiation of the stem cells has been slowed down. 

The HSC differentiation-inhibiting genes were also examined for their effects 
on HSC growth in liquid culture. The effect of GATA3 over-expression on human HSC 
differentiation was examined in liquid culture. Here, stem cells were transfected with the 
same vectors described above (which harbor the ID1 gene, GATA3 gene, or no HSC gene), 
and grown in liquid culture. CD34 + and GFP + cells were sorted. Expression of CD34 was 
monitored during the culture. Cells without transfection were used in a control analysis. 
The results indicate that, as compared to the control, ID1 had no effect on differentiation of 
the CD34 + cells. However, expression of GATA3 significantly slowed the differentiation 
process as indicated by the rate of reduction of CD4 + cells. 

Example 3. Novel Molecular Markers Expressed in mouse HSCs 

This Example describes use of RNA expression profiling to characterize 
purified mouse HSCs. Mouse HSCs were purified using a combination of antibodies to cell- 
surface markers. The following three cell populations were purified from murine bone 
marrow as described in Zhao et al., Blood 96: 3016-22, 2000; and Zhong et al., Blood 100: 
3521-6, 2002. 

Cell type Immunophenotype HSC activity 

LT-HSC Lin,c-kit + ,Sca-l + ,CD38 + , CD34" IX 
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Facilitator Cells Lin ,c-kit + ,Sca-r,CD38~, CD34+ 0. IX 
Progenitor Cells Lin~,c-kit + ,Sca-l + ,CD38 + , CD34+ 0. IX 



Cells were purified from normal BL6 mice using flow cytometry. Three 
different preparations of sorted cells for each population were prepared and combined prior 
to the isolation of total RNA. The RNA was quantified using the Ribogreen fluorescence- 
based solution assay (e.g., as described in Jones et al., Anal Biochem 265: 368-74, 1998). 
lOng of each pooled RNA preparation was labeled in duplicates using the triple labeling 
procedure (as described, e.g., in Hrabovszky et al., J. Histochem. Cytochem. 43: 363-370, 
1995) and hybridized to affymetrix U74A gene chips according to the manufacturer's 
instructions. Intensity values were obtained for each gene and sample using GeneChip 
software. These Average difference (AD) values were exported to a spreadsheet program 
and analyzed by first filtering for genes which are expressed above a threshold criteria (50 in 
at least two samples), and whose average for each population was expressed >2X or < 2X 
between any two cell populations and where ANOVA analysis showed a significant 
difference (P<0.01) between any two populations. 

Examples of genes upregulated in HSCs are shown in Table 3. The genes 
were analyzed for patterns using Genespring software and arranged by functional gene 
classification using GO ontogeny. Accession numbers or identification numbers from other 
public databases of these genes, as well as levels of up-regulation of these genes in HSCs as 
compared to non-HSCs, are also shown in the table. 

Example 4. Characterization of Genes Differentially Expressed in mouse HSCs 

To correlate stem cell activity of the three subsets with gene expression, a 
hypothetical stem cell activity pattern corresponding to the in vivo repopulating activity of 
the three subsets was generated and used for comparison of the normalized expression levels 
of each differentially expressed gene identified above. Principle Component Analysis 
(PCA) on the stem cell expression data was performed to identify gene expression patterns. 
This is an unsupervised computational method used to identify major patterns in diverse data 
types including gene expression data (Alter et al., Proc Natl Acad Sci USA 97:10101-10106, 
2000; and Holter et al., Proc Natl Acad Sci USA 97:8409-8414, 2000). The correlation 
analysis of the gene expression patterns of the differentially expressed genes with stem cell 
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activity identified genes with highly significant (Pearson R >0.95) correlations. These genes 
are shown in Table 4. In addition to genes upregulated in HSCs, the analysis also identified 
genes whose expression negatively correlated with LTR HSCs (i.e., down-regulated 
expression). Examples of these genes are shown in Table 5. 

Some of the differentially expressed genes were further analyzed and 
classified according to their biological functions. The results are shown in Table 6. As 
shown in Tables 3, 4, and 6, the upregulated genes in mouse HSCs also encode proteins of 
diverse biological properties, similar to genes upregulated in the human HSCs. For example, 
a number of transmembrane proteins were enriched in the mouse HSCs, as exemplified in 
Table 7. These molecules can be useful as novel surface markers for isolating HSCs. Some 
of transcription factors that are upregulated in the mouse HSCs are shown in Table 8. Their 
upregulated expression levels in the CD34~CD38 + HSCs relative to that in the facilitator cells 
(CD38'CD34 + ) and progenitor cells (CD34 + CD38 + ) are shown in Figure 3. 

The expression of several known transcription regulation factors was found to 
correlate positively with LTR HSC activity. These include Cited2, GAT A3, Hdac3, Irf6, Jun 
B, Nmycl, Rnpsl, Xbpl, and Zfp292. Little is known regarding the role of these specific 
transcription factors in the control of HSC biology. These essential transcription factors 
could play an important role in regulating HSC development and differentiation. 

To determine if any of the differentially expressed transcription factors are 
themselves regulating transcription in LTR HSCs, we performed a search of putative 
upstream regulatory regions (10 kb upstream of start codons) of the interrogated genes for 
binding sites of the nine transcription factors. Statistical analysis of these results revealed 
that only the binding sites of GATA were significantly enriched (P<0.05) within the 
differentially expressed genes. Interestingly, this list contains a large fraction (20 of 52) of 
the genes whose expression positively correlated with HSC activity, suggesting the 
possibility that Gata may play an important role in the control of LTR HSC biology. A 
small number of gene (3 of 20) whose expression is negatively correlated with HSC activity 
also contained Gata binding sites, suggesting the possibility that low levels of Gata 
expressed in STR HSC may influence gene expression at later stages. 

To confirm the data from expression profiling, we performed semi- 
quantitative RT-PCR on total RNA extracted from the three BM subsets for three of the LTR 
HSC genes identified. These included the transcription factors Gata 3, Jun B, and the 
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thrombopoietin receptor c-Mpl. The results demonstrated that all three mRNAs are 
expressed at significantly higher levels in CD38 + CD34" cells compared to the other two 
subsets. 



Table 3. Genes Upregulated in Mouse HSCs 



Symbol 


Description 


RefSeq 


Swiss Prot Keywords 


HSC/non- 
HSC 


AU044919 


expressed sequence AU0449 1 9 


AU044919 


Glycoprotein 
Immunoglobulin C region 
Immunoglobulin domain 


79.7 


Klf2 


Kruppel-like factor 2 (lung) 


NM 008452 


Activator DNA-binding 
Metal-binding Nuclear 
protein Repeat 
Transcnption regulation 
Zinc-finger 


44.9 


Carl 


carbonic anhydrase 1 


NM 009799 


Lyase Zinc 


36.8 


Mm.220154 


Mus musculus anti-HIV-1 reverse 
transcriptase single-chain variable 
fragment mRNA, complete cds 


NA 


None 


JU. 1 


2010309G21 Rik 


RlrKEN cDNA 2010309G21 gene 


none 


Immunoglobulin C region 
Immunoglobulin domain 


28.8 


XT A 

IN A 


M80423:Mus castaneus IgK chain 
gene, C-region, J ena /cas— \\)*5ZZ) 
/gb=M80423/gi= 196865 
/ug t= ivini. £ ioov/ < t /ien— j/j rnt\.i> /\ 


IVIoUHZJ 


None 


20.9 


Fragilis 


Fragilis 


1N1V1 UZjj / o 


None 


1 7. 1 


Smoc 1 


SPARC related modular calcium 
binding l 


INlVl UZZj 1 O 


None 


1 5 8 


JOjU^ 1 j dUoKlK 


KIPwCIN CL/lN/\ JOJUtl jCUo gCllC 


IN 1VI Ut"u<5 J 




14.9 


583043 lAlORik 


RIK.EN cDN A 583043 1 A 1 0 gene 


none 


None 


14.4 


A1325941 


expressed sequence AI325941 


AI325941 


None 


14.2 


Cdknlc 


cycl in-dependent kinase inhibitor 1C 
(P57) 


NM 009876 


Alternative splicing Cell 
cycle 


14.1 


Lisch7 


liver-specific bHLH-Zip transcription 
factor 


none 


None 


13.9 


AW108012 


expressed sequence AW 1080 12 


AW108012 


None 


13.8 


Akrlcl3 


aldo-keto reductase family 1 , member 
C13 


NM 013778 


None 


13.3 


0910001 L24Rik 


RIK.EN cDNA 0910001 L24 gene 


NM 022419 


None 


12.7 


A1842353 


expressed sequence AI842353 


AI842353 


None 


11.7 


Tgm2 


transglutaminase 2, C polypeptide 


nm uuyj/j 


Acyltransferase Calcium- 
binding Transferase 


1 1 A 


Nckap 1 


NCK-associated protein 1 


none 


Transmembrane 


11.3 


Serpina3g 


serine (or cysteine) proteinase 
inhibitor, clade A, member 3G 


none 


None 


11.3 


1700008C22Rik 


RIKEN cDNA 1700008C22 gene 


none 


None 


10.4 


Nmycl 


neuroblastoma myc -related oncogene 
1 


NM 008709 


DNA-binding Nuclear 
protein Phosphorylation 
Proto-oncogene 


10.4 


Zfhxla 


zinc finger homeobox 1 a 


NM 01 1546 


Activator DNA-binding 
Homeobox Metal-binding 
Nuclear protein Repeat 
Repressor Transcription 
regulation Zinc-finger 


10.4 


H2-Ebl 


histocompatibility 2, class II antigen 
E beta 


NM 010382 


Glycoprotein MHC II 
Signal Transmembrane 


1 0.0 


AU044919 


expressed sequence AU044919 


AU044919 


Glycoprotein 
Immunoglobulin C region 
Immunoglobulin domain 


9.9 
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Gbp2 


guanylate nucleotide binding protein 
2 


NM 010260 


None 


9.5 


Gabbrl 


garnma-aminobutyric acid (GABA-B) 
receptor, 1 


NM 019439 


Alternative ^nlieinp fniler! 

coil G-protein coupled 
receptor Glycoprotein 
Postsynaptic membrane 
Repeat Signal 
Transmembrane 


9.5 


D8Ertd69e 


DNA segment, Chr 8, ERATO Doi 
69, expressed 


none 


None 


9.2 


Gata3 


GATA binding protein 3 


NM 008091 


Activator DNA-binding 
Nuclear protein T-cell 
Transcription regulation 
Zinc-finger 


9.1 


C130052II2Rik 


RIKEN cDNA C 13005211 2 gene 


NM 146047 


None 


8.7 


0610025U9Rik 


RIKEN cDNA 0610025119 gene 


NM 029555 


None 


8.6 


Ten 5 


transcription factor 1 5 


NM 009328 


None 


8.6 


H2-Aa 


Viict/"\rT»tnrt'»tiV"»ilitAv flncc II antioi^ 
1 1 IMOLU1I IJJiill Ul 11 ly Z., Cldab 11 (lllllgCIl 

A alnhn 

n, a i yj i in 


NM 010378 


3D-structure Glycoprotein 

MUr II <sionnl 
Tran^mpmhrarip 


8.5 


Tall 


T-cell acute lymphocytic leukemia 1 


NM 01 1527 


f^hrnmosnmnl 

translocation 
Differentiation DNA- 
binding Phosphorylation 
Proto-oncogene 
Transcription regulation 


8.3 


Myozl 


myozenin 1 


NM 021508 


None 


7.9 


493042 U07Rik 


RIK.EN c DNA 493042 1 J07 gene 


none 


None 


7.4 


leh-6 


immunoglobulin heavy chain 6 
(heavy chain of IgM) 


none 


Alternative splicing 
vjiycuproiciii 
Immunoglobulin C region 
Immunoglobulin domain 
Transmembrane 


7.3 


Hoxb5 




1 >l 1VI \J\J O £. oo 


Developmental protein 
DNA-binding Homeobox 
Nuclear protein 

1 I til 1 JJiiUI 1 1 CgUltlLlUll 


7.3 


Col9al 


procollagen, type IX, alpha 1 


NM 007740 


Alternative splicing 

til 11 lagC \_/UlIagdl 
f'onnfvti vp ticctif* 

Extracellular matrix 
Glycoprotein 
Hydroxy lation Repeat 
Signal 


7.2 


Meisl 


myeloid ecotropic viral integration 
site 1 


NM 010789 


None 


7.1 


Elal 


elastase I , pancreatic 


none 


None 


7.0 


Hiatl 


hippocampus abundant gene transcript 
1 


NM 008246 


None 


7.0 


Fah 


fumarylacetoacetate hydrolase 


NM 010176 


Hydrolase Phenylalanine 
entnholism Tvmsine 
catabolism 


6.9 


Cypfl3 


cytochrome P450 CYP4F13 


NM 130882 


None 


6.7 


NA 


:Mus musculus transcription factor 
PBX3b (PBX3b) mRNA, complete 
cds /cds=( 1 1 8, 1 1 73) /gb=AF020200 
/gi=2432016/ug=Mm.7331 
/len=2467 mRNA 


AF020200 


None 


6.5 


iRi 


immunoglobulin joining chain 


NM 152839 


Glycoprotein Signal 


6.3 


NA 


:AV336991 Mus musculus cDNA, 3 
end /clone=6332407A0 1 
/clone_end=3 /gb=AV336991 
/gi=63 77043 /ug=Mm.99212 
/len=201 /NOTE=replacement for 
probe set(s) 100264 f at on MG- 


AV336991 


None 


6.2 
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U74A mRNA 








Ctla2b 


cytotoxic T lymphocyte-associated 
protein 2 beta 


none 


Repeat Signal T-cell 


6.1 


Serpinb6 


serine (or cysteine) proteinase 
inhibitor, clade B, member 6 


NM 009254 


Serine protease inhibitor 
Serpin 


5.8 


Mm.29940 


ESTs 


NA 


None 


5.8 


AU043625 


expressed sequence AU043625 


NM 133910 


None 


5.8 


Col4al 


procollagen, type IV, alpha 1 


none 


Basement membrane 
Collagen Connective tissue 
Extracellular matrix 
Glycoprotein 
Hydroxylation Repeat 
Signal 


5.6 


lgh-4 


immunoglobulin heavy chain 4 
(serum IgG I ) 


none 


Alternative splicing 
Glycoprotein 
Immunoglobulin C region 
Immunoglobulin domain 


5.5 


Siat6 


sialyltransferase 6 (N- 
acetyllacosaminide alpha 2,3- 
sialyltransferase) 


NM 009176 


Glycoprotein 
Glycosyltransferase Golgi 
stack Signal-anchor 
Transferase 
Transmembrane 


5.4 


Igk-C 


immunoglobulin kappa chain, 
constant region 


none 


None 


5.4 


Sdpr 


Serum deprivation response 


NM 138741 


None 


5.4 


Duspl 


dual specificity phosphatase 1 


NM 013642 


Cell cycle Hydrolase 


5.3 


Cited2 


Cbp/p300-interacting transact i vator, 
with Glu/ Asp-rich carboxy-terminal 
domain, 2 


NM 010828 


Alternative splicing 
Nuclear protein 


5.2 


Epor 


erythropoietin receptor 


NM 010149 


Glycoprotein Receptor 
Signal Transmembrane 


5.1 


Min.200980 


Mus musculus, Similar to 
translocation nrotein 1 clone 
IMAGE:5347105, mRNA, partial cds 


NA 


None 


5.0 


Atf2 


activating transcription factor 2 


none 


Activator Alternative 
splicing DNA-binding 
Metal-binding Nuclear 
protein Phosphorylation 
Transcription regulation 
Zinc-finger 


5.0 


Ccnel 


cyclin El 


NM 007633 


Cell cycle Cell division 
Cyclin Nuclear protein 
Phosphorylation 


5.0 


M1U3 


myeloid/lymphoid or mixed lineage- 
leukemia translocation to 3 homolog 
(Drosophila) 


NM 027326 


None 


4.9 


D5Ertd40e 


DNA segment, Chr 5, ERATO Doi 
40, expressed 


none 


None 


4.9 


Zfp216 


zinc finger protein 216 


NM 009551 


None 


4.8 


Syp 


synaptophysin 


NM 009305 


Calcium-binding 
Glycoprotein Nerve 
Phosphorylation Repeat 
Synapse Synaptosome 
Transmembrane 


4.8 


Nedd4 


neural precursor cell expressed, 
developmental ly down-regulted gene 
4 


NM 010890 


Ligase Repeat Ubiquitin 
conjugation 


4.7 


Pbxl 


pre B-cell leukemia transcription 
factor 1 


NM 008783 


None 


4.7 


6330407GllRik 


RIKEN cDNA 6330407G1 1 gene 


NM 023423 


None 


4.6 


Ashl 


absent, small, or homeotic discs 1 
(Drosophila) 


NM 138679 


None 


4.5 


Lrmp 


lymphoid-restricted membrane 
protein 


NM 008511 


None 


4.5 


Casp8ap2 


caspase 8 associated protein 2 


NM 011997 


None 


4.5 
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Mm.30163 


Mus musculus, clone 
IMAGE:4952607, mRNA 


NA 


None 


4.5 


Ctsl 


cathepsin L 


NM 009984 


Glycoprotein Hydrolase 
Lysosome Signal Thiol 
protease Zymogen 


4.5 


Sfpq 


splicing factor proline/glutamine rich 
(polypyrimidine tract binding protein 
associated) 


NM 023603 


None 


4.4 


201 0004 A03Rik 


RIKEN cDNA 201 0004 A03 gene 


none 


None 


4.3 


Car2 


carbonic anhydrase 2 


NM 009801 


Lyase Zinc 


4.2 


Mm.22896 


ESTs 


NA 


None 


4.1 


AI573938 


expressed sequence AI573938 


none 


None 


3.9 


Vasp 


vasodilator-stimulated phosphoprotein 


none 


Actin-binding 
Phosphorylation 


3.9 


AA408451 


expressed sequence AA40845 1 


AA408451 


None 


3.7 


Pftkl 


PFTAIRF nrntein kinase 1 


NM 011074 


None 


3.6 


Tieg 


TGFB inducible early growth 
response 


NM 013692 


DNA-binding Metal- 
binding Nuclear protein 
Repeat Repressor 
Transcription regulation 
Zinc-finger 


3.6 


Igk-V28 


immunoglobulin kappa chain variable 
28 (V28) 


none 


Immunoglobulin C region 
Immunoglobulin domain 


3.6 


Mm.1806 


Mus musculus, Similar to K1AA1404 
protein, clone IMAGE: 5252426, 
mRNA, partial cds 


NA 


None 


3.5 


Mm 25 1 1 5 


ESTs 


NA 


None 


3.5 


Ccm41 


{~ , {" , R4 oarhnn oatnholitp rpnrp^fon 4- 

like (S. cerevisiae) 


none 


Biological rhythms 


3.5 


Cpo 


coproporphyrinogen oxidase 


NM 007757 


Hpiiip hin^vnthp^i^ Iron 
Mitochondrion 
Oxidoreductase Porphyrin 
biosynthesis Transit 
peptide 


3.5 


Nuprl 


nuclear protein 1 


NM 019738 


None 


3.5 


Mm.5510 


similar to gene overexpressed in 
astrocytoma f Homo sapiens] 


NA 


None 


3.4 


Rab33b 


RAB33B, member of RAS oncogene 
family 


NM 016858 


Golgi stack GTP-binding 
Lipoprotein Prenylation 
Protein transport 


3.4 


9430065 LI 9Rik 


RIKEN cDNA 9430065L19 gene 


NM 146083 


None 


3.4 




progesterone receptor 


NM 008829 


DNA-binding Nuclear 
protein Receptor Steroid- 
binding Transcription 
regulation Zinc-finger 


3.4 


LOC2 1 8490 


similar to Transcription factor BTF3 
(RNA polymerase B transcription 
factor 3) 


NM 145455 


Alternative splicing 
Nuclear protein 
Transcription regulation 


3.4 


4930434 H03Rik 


RIKEN cDNA 4930434H03 gene 


none 


None 


3.3 


Actn3 


Actinin alpha 3 


NM 013456 


Actin-binding Multigene 
family Repeat 


3.3 


Mm.20231 1 


Mus musculus, clone 

IMAGE: 1379624, mRNA, partial cds 


NA 


GTP-binding Lipoprotein 
Membrane Multigene 
family Palmitate 
Transducer 


3.3 


Gtpi 


interferon-g induced GTPase 


NM 019440 


None 


3.3 


Nat2 


N-acetyltransferase 2 (arylamine N- 
acetyltransferase) 


NM 010874 


Acyltransferase Multigene 
family Polymorphism 
Transferase 


3.3 


Eya2 


eyes absent 2 homolog (Drosophila) 


none 


Alternative splicing 
Developmental protein 
Multigene family 


3.3 


U10037N09Rik 


RIKEN cDNA 1 1 10037N09 gene 


none 


None 


3.2 


503341 4 D02Rik 


RIKEN cDNA 5033414D02 gene 


NM 026362 


None 


3.1 


Mm.26147 


ESTs 


NA 


None 


3.1 
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114 


interleukin 4 


NM 021283 


B-cell activation Cytokine 
Glycoprotein Growth 
factor Signal 


3.1 


Ubapl 


ubiquitin-associated protein I 


NM 023305 


None 


3.1 


Acox 1 


acyl-Coenzyme A oxidase 1 , 
palmitoyl 


NM 015729 


FAD Fatty acid 
metabolism Flavoprotein 
Oxidoreductase 
Peroxisome 


2.9 


Ccl5 


chemokine (C-C motif) ligand 5 


NM 013653 


Chemotaxis Cytokine 
Inflammatory response 
Signal T-cell 


2.9 


AW457192 


expressed sequence AW457192 


NM 134084 


Cyclosporin Isomerase 
Mitochondrion Multigene 
family Rotamase Transit 
peptide 


2.9 


2610016KllRik 


RIKEN cDNA 2610016K1 1 gene 


none 


None 


2.8 


Fzd4 


frizzled homolog 4 (Drosophila) 


NM 008055 


npvelonmtntal nrotf*in fi- 

protein coupled receptor 
Glycoprotein Multigene 
family Signal 
Transmembrane 


2.8 


Pla2g4a 


phospholipase A2, group IVA 
(cytosolic, calcium-dependent) 


NM 008869 


Calcium Hydrolase Lipid 

degradation 

Phosphorylation 


2.8 




lnucriii 


MM DOOM? 




2.7 


NA 


AV239653 Mus musculus cDNA, 3 
end /clone=4732435F04 
/clone_end=3 /gb=AV239653 
/gi=6 1 92 1 60 /ug-Mm.883 1 3 

probe set(s) 9641 1_f_at on MG-U74A 
mRNA 


AV239653 


None 


2.7 


Ten 2 


transcription factor 12 


NM 011544 


/\IICIllctllVC dpiiCIIl{* 

Developmental protein 
DNA-binding Nuclear 
protein Transcription 
regulation 


2.7 


Madh7 

IVluUl 1 / 


MAR hntnnlno 7 ( Y\vc\^.c\r\\\\\z\\ 

1 VI rx L/ UVJUIUlvJc*. / I L/I W^v/^Jl WILL j 


NM 008543 


Alternative splicing 
Multigene family 

TrnnCfrintifMi rRtritljitinn 

1 1 til I^Vvl l^J ll\Jll 1 ^£^Vlld It VJ! 1 


2.7 


Gem 


OTP hindinp nrotein ( pene 
overexpressed in skeletal muscle) 


NM 010276 


(j TP-binding Membrane 
Phosphorylation 


2.7 


Tpml 


tropomyosin 1 , alpha 


NM 024427 


3D _ structure Acetylation 
Alternative splicing Coiled 
coil Multigene family 
Muscle protein 
Phosphorylation Repeat 


2.7 


Map 17 


membrane-associated protein 1 7 


NM 026018 


None 


2.7 


Dcx 


doublecortin 


NM 010025 


Neurogenesis Neurone 
Phosphorylation Repeat 


2.7 


Igk-V28 


immunoglobulin kappa chain variable 
28 (V28) 


none 


Immunoglobulin C region 
Immunoglobulin domain 


2.6 


Rnfll 


ring finger protein 1 1 


NM 013876 


None 


2.6 


Nfix 


nuclear factor l/X 


NM 010906 


None 


2.6 


Lin 7c 


lin 7 homolog c (C. elegans) 


NM 011699 


None 


2.5 


Cln3 


ceroid lipofuscinosis, neuronal 3, 
juvenile (Batten, Spielmeyer-Vogt 
disease) 


NM 009907 


Glycoprotein Lysosome 
Transmembrane 


2.5 


Hhex 


hematopoietically expressed 
homeobox 


NM 008245 


Developmental protein 
DNA-binding Homeobox 
Nuclear protein 


2.5 


Gabl 


growth factor receptor bound protein 
2 -associated protein 1 


NM 021356 


None 


2.5 


None 


none 


none 


None 


2.5 


Kcni3 


potassium inwardly-rectifying 


NM 008426 


Ion transport Ionic channel 


2.5 
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channel, subfamily J, member 3 




Potassium transport 
Transmembrane Voltage- 
gated channel 




Cradd 


CASP2 and RIPKI domain 
containing adaptor with death domain 


NM 009950 


Apoptosis 


2.5 


Mm.29914 


ESTs 


NA 


None 


2.4 


Fos 


FBJ osteosarcoma oncogene 


NM 010234 


DNA-binding Nuclear 
protein Phosphorylation 
Proto-oncogene 


2.4 


Mm.24247 


ESTs 


NA 


None 


2.4 


4930472Gl3Rik 


RIKEN cDNA 4930472G13 gene 


NM 029447 


None 


2.4 


Ormdl3 


ORMl-like 3 (S. cerevisiae) 


NM 025661 


None 


2.4 


Umpk 


uridine monophosphate kinase 


none 


Kinase Transferase 


2.4 


Creg 


cellular repressor of E 1 A-stimulated 

genes 


NM 011804 


None 


2.4 


Utrn 


utrophin 


none 


None 


2.3 


Mm.27769 


ESTs, Weakly similar to RIKEN 
cDNA 061001 1 El 7 [Mus musculus] 
rM.musculus] 


NA 


None 


2.3 




interferon gamma induced GTPase 


NM 018738 


None 


2.3 


Arg2 


arginase type II 


NM 009705 


Arginine metabolism 

rijr ui ui<ssvs ivi ui ig,<Ji i wt 

Mitochondrion Transit 
peptide Urea cycle 


2.3 


Pklr 


pyruvate kinase liver and red blood 
cell 


NM 013631 


Alternative splicing 
Glycolysis Kinase 
Magnesium Multigene 
family Phosphorylation 
Transferase 


2.2 


1 X lOOlOAOfiRik 

I O 1UU lUAUUIVlN 


RIKFNrDNA IRIOOIOAOfi oene 

1x1 IN. 1,1 1 L > / \ 1 O 1 UU 1 V AUU h^tnc 


NM 016921 




2.2 


Mm.532 


F^T^ Wf^alclv similar \ct 

lysophospholipase 1; phospholipase 
la; lysophopholipase 1 [Mus 
musculusl [M. musculus] 


NA 


None 


2.2 


Vamp5 


vesicle-associated membrane protein 
5 


NM 016872 


Multigene family 
Myogenesis Signal-anchor 
Transmembrane 


2.2 


0710001O03Rik 


RIKEN cDNA 0710001003 gene 


NM 146094 


None 


2.2 


2610003J05Rik 


RIKEN cDNA 2610003J05 gene 


none 


None 


2.2 


Tdell 


tumor differentially expressed 1 , like 


NM 019760 


None 


2.2 


Serpinfl 


serine (or cysteine) proteinase 
inhibitor, clade F), member 1 


NM 011340 


Glycoprotein Serpin Signal 


2.1 


Scotin 


scotin gene 


NM 025858 


None 


2.1 


G3bp2 


Ras-GTPase-activating protein 
(GAP<120>) SH3-domain binding 
protein 2 


NM 011816 


None 


2.1 


H90002H23Rik 


RIKEN cDNA 1 1 90002 H23 gene 


NM 025427 


None 


2.1 


Nsccn 1 


non-selective cation channel 1 


NM 010940 


None 


2.1 


Tgoln2 


trans-golgi network protein 2 


NM 009444 


None 


2.1 


Ywhae 


tyrosine 3-monooxygenase/tryptophan 
5-monooxygenase activation protein, 
epsilon polypeptide 


NM 009536 


None 


2.1 


463l408OHRik 


RIKEN cDNA 4631408011 gene 


none 


None 


2.1 


Pou2afl 


POU domain, class 2, associating 
factor 1 


NM 011136 


Nuclear protein 
Transcription regulation 


2.1 


Mm.220953 


Mus musculus, clone 
IMAGE:4206769, mRNA 


NA 


None 


2.1 


Casp6 


caspase 6 


NM 009811 


Apoptosis Hydrolase Thiol 
protease Zymogen 


2.0 


None 


none 


none 


Glycoprotein 
Immunoglobulin C region 
Immunoglobulin domain 


2.0 


Nr4al 


nuclear receptor subfamily 4, group 
A, member 1 


NM 010444 


DNA-binding Nuclear 
protein Phosphorylation 
Receptor Transcription 


2.0 
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regulation Zinc-finger 




170002301 IRik 


RIKENcDNA 1700023011 gene 


NM 029339 


None 


2.0 


Brca2 


breast cancer 2 


NM 009765 


Polymorphism Repeat 


2.0 


H2-T22 


histocompatibility 2, T region locus 
22 


NM 010397 


None 


2.0 



Table 4 Genes With Upregulated Expression and Correlated Stem Cell Activity 



Symbol or Acc. No. 


Gene Description or similarity 
to known proteins 


Corrrelation to stem cell 


Unigene No. 


Rnpsl 


ribonucleic acid binding 
protein S 1 


1.000 


Mm.1951 


Junb 


Jun-B oncogene 


1.000 


Mm. 1167 


Hdac3 


histone deacetylase 3 


1.000 


Mm.20521 


IrfiS 


interferon regulatory factor 6 


1 .000 


Mm.4179 


Gata3 


GATA binding protein 3 


0.997 


Mm.606 


Xbpl 


X-box binding protein 1 


0.993 


Mm.22718 


Cited2 


Cbp/p300-interacting 
transactivator, with Glu/ Asp- 
rich carboxy-terminal domain, 
2 


0.992 


Mm.9524 


Nmycl 


neuroblastoma myc-related 
oncogene 1 


0.986 


Mm. 16469 


Zfp292 


zinc finger protein 292 


0.975 


Mm.38193 


Bdkrbl 


bradykinin receptor, beta 1 


1.000 


Mm.57076 


Map 17 


membrane-associated protein 
17 


0.995 


Mm.30181 


Ormdl3 


ORM 1 -like 3 (S. cerevisiae) 


0.990 


Mm. 180546 


Fzd4 


frizzled homolog 4 
(Drosophila) 


0.988 


Mm.68712 


Lfii4 


leucine-rich repeat LGI 
family, member 4 


0.961 


Mm. 1662 


Bdkrbl* 


bradykinin receptor, beta 1 


1.000 


Mm.57076 


Socs2 


suppressor of cytokine 
signaling 2 


0.996 


Mm.4132 


Fzd4* 


frizzled homolog 4 
(Drosophila) 


0.988 


Mm.68712 


Kit* 


kit oncogene 


0.961 


Mm. 43 94 


Inpp5d 


inositol polyphosphate- 5- 
phosphatase D 


0.958 


Mm.15105 


Fbxo9 


f-box only protein 9 


1.000 


Mm.28584 


Nedd4 


neural precursor cell 
expressed, developmentally 
down-regulted gene 4 


0.993 


Mm. 16553 


Rnni 


ring finger protein 1 1 


0.992 


Mm.25228 


Ianl 


immune associated nucleotide 
1 


0.999 


Mm.28395 


Iigp 


interferon- inducible GTPase 


0.997 


Mm.29008 


Ifi47 


interferon gamma inducible 
protein 


0.984 


Mm.24769 


Tgtp 


T-cell specific GTPase 


0.994 


Mm. 15793 


Igtp 


interferon gamma induced 
GTPase 


0.993 


Mm.858 


Gtpi 


interferon-g induced GTPase 


0.989 


Mm.33902 


Serpinb6a 


serine (or cysteine) proteinase 
inhibitor, clade B, member 6a 


0.996 


Mm.2623 


Serpina3g 


serine (or cysteine) proteinase 
inhibitor, clade A, member 3G 


0.987 


Mm. 15085 


Camk2b 


calcium/calmodulin-dependent 
protein kinase li, beta 


0.999 


Mm.4857 


Gabl 


growth factor receptor bound 


0.997 


Mm.24573 
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protein 2-associated protein 1 






Gabarapl 1 


gamrna-aminobutyric acid 
(GABA(A)) receptor- 
associated protein-like 1 


0.997 


Mm. 14638 


Mtmrl3 


myotubularin related protein 
13 


0.996 


Mm.200250 


Mt2 


metallothionein 2 


0.999 


Mm. 147226 


Car2 


carbonic anhydrase 2 


0.995 


Mm.1186 


Cdknlc 


cycl in-dependent kinase 
inhibitor 1C(P57) 


0.986 


Mm. 168789 


Lxn7 


lipocalin 7 


0.999 


Mm.15801 










A430017F18 


No similar gene 


1.000 


Mm.44883 


AU044919 


No significant similar gene 


1,000 


Mm. 14438 


2310075MI7Rik 


Similar to S3543 GTP-binding 
protein (90%) 


0.999 


Mm J 96592 


E1I2 


Eleven-nineteen lysine-rich 
leukemia gene 2 


0.998 


Mm.21288 


LOC207685 


Hypothetical protein 


0.998 


Mm.38214 


23 1006 1104 Rik 


No similar gene 


0.998 


Mm.5624 


5830431 A lORik 


Contain Corl/Xlr/Xmr 
conserved region 


0.997 


Mm. 1148 


2700007P21Rik 


Unknown protein 


0.997 


Mm.3587 


B930086G17 


No similar gene 


0.992 


Mm.24738 


2410166105Rik 


Hypothetical protein 


0.989 


Mm.30153 


D10Ertd749e 


Similar to ZW10 interacting 
protein- 1 


0.986 


Mm.38994 


221 0023 F24 Rik 


Contain B-box Zn-finger and 
SPRY domain 


0.983 


Mm.5510 


Riken 4237666 


No significant similar gene 


0.978 


Mm.276231 


6230421 P05 Rik 


No similar gene 


0.978 


Mm.26147 


46314080! 1 Rik* 


No significant similar gene 


0.964 


Mm.2935 


1 1 10054N06Rik* 


Unknow protein with Ankyrin 
repeat 


0.960 


Mm.15351 



Table 5 Genes down-regulated in CD38+CD34- Cells 



Symbol or Acc. No. 


Description 


Correlation to SC 
activity 


Unigene No. 


Satbl 


Special AT-rich sequence 
binding protein 1 


0.955 


Mm.4381 


Ptpro 


Protein tyrosine phosphatase, 
receptor type, O 


0.999 


Mm.4715 


Sell 


Selectin, lymphocyte 


0.988 


Mm. 1461 


Ccl9 


Chemokine (C-C motif) ligand 
9 


0.988 


Mm.2271 


Cnn3 


Calponin 3, acidic 


0.988 


Mm.22171 


Lgals3 


Lectin, galactose binding, 
soluble 3 


0.971 


Mm.2970 


Mki67 


Antigen identified by 
monoclonal antibody Ki 67 


0.998 


Mm.4078 


Binl 


Bridging integrator 1 


0.977 


Mm.4383 


Sult4al 


Sulfotransferase family 4A, 
member 1 


1.000 


Mm.2045l 


Hdc 


Histidine decarboxylase 


0.996 


Mm. 18603 


AI132321 


Contain phospholipase D. 
active site motif 


-1. 000 


Mm.203915 


2610036LI3Rik 


No similar gene 


-1.000 


Mm.23526 


BC0 18347 


Similar to translation initiation 
factor 1F-2 


-1.000 


Mm. 154309 


X90778 


Similar to Histone H2B 


-1.000 


Mm.21579 


AW060549 


Similar to Retrovirus-related 
POL polyp rotcin 


-0.999 


Mm.29177 
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X67863 


Similar to Octapeptide-repeat 
protein T2 


-0.995 


Mm.35868 


XI 5378 


Similar to Myeloperoxidase 
and Eosinophil peroxidase 
precursor 


-0.975 


Mm.4668 


Plac8 


Uncharacterized Cys-rich 
domain containing protein 


-0.960 


Mm.34609 


D13Ertd275e 


Hypothetical protein 


-0.952 


Mm.2123l 



Table 6. Cassification and Characterization of Genes Upregulated in Mouse HSCs 



Class 


Name 


Sequence Description 


Sequence 
Code 


Unigene 
Cnde 


Protein 
ID 




Birc5 


uacuiuvirui i/vr rcpcai-cuniaining d 


1 U 1 JZ 1 


ivim.o J JZ 


\J /UZUl 




opill 


crxtnrl 1 1 n 

apuiuun 


77JOJ 


Mm 49 1 Q1 

ivim.'tz 1 7j 




h ret m n <io m a 1 


Dlgl 


IVI. UlUoCUIUo Ulg 1 IIlI\l>(/\. 


Ql Iftd 

7J 1 U*T 




r J 1 OU / 


V^l 11 Ul 1 IvJOUl 1 lul 


Calm2 


\yfnc TTii ten l ft ic f* *"i 1 mnHi 1 1 in c\mtViAcic {(~*r%\A\ 
IVlUo IIlUoCUlUo LcllllIOUUllll byiHIlCbli ^v^dlVl f 






r UZ J7J 


Enzyme 


Ctsl 


cathepsin L 


101963 


Mm.930 


P06797 


FnTvin? 
cuz.y i uc 


UUl 1 


gudiiuMiie uipiiuspiidic \\jur) uissociduon 
inhihitrn* 1 

UllllUllUI I 




Y4m "?ft^S1ft 
ivi m. ZU JO j U 


pcn^QA 
r DUJyO 


Enzyme 


Hadh2 


hydroxysteroid ( 1 7 -beta) dehydrogenase 1 0 


101045 


Mm.6994 


008756 


enzyme 


IVllX 


Mouse metallothionein II (MT-I1) gene. 


1 A 1 Cfl 1 




ruz /ys 


CI lZ.y 111C 


Pnp 


punnc-nut-icOiiue pnuspnoryiase 


Q19QA 

7JZ7U 


Mm 1 


rZ j4VZ 


LZ.I iz.y II 1L- 


V UU 1 


Vhlh-interacting deubiQuitinating enzyme 1 


i (Lf\n 1 ft 
1 OU / 1 U 


Mm O/llfll 




Kinase 


Csnkle 


casein kinase 1, epsilon 


97925 


Mm.30199 


Q9QUI3 


(Cinase 


Nme3 


expressed in non-metastatic cells 3 


QA QO 1 • 

y4ys i i 


Pl>4. n T7T70 

Mm. 2/2 /a 




Lectin 


Lgaisy 


lectin, galactose binding, soluble 9 




Ivlm. I aUo / 


008573 


Metabolism 


Aldhlal 


aldehyde dehydrogenase family 1 , 
subfamily Al 


100068 


Mm.4514 


P24549 


Metabolism 


Aiun i a / 


aldehyde dehydrogenase family 1 , 
subfamily A 7 




ivim. i4ouy 


033y45 


Metabolism 


Cpo 


coproporphyrinogen oxidase 


98505 i 


Mm.35820 


P36552 


Metabolism 


Cpo 


coproporphynnogen oxidase 


VoDUO r 


ivim.J jozU 


r JoDOZ 


IVlCldDUlloITl 


ten i 


enoyl coenzyme A hydratase 1, peroxisomal 


yj 04 


ivim.z I I z 




IVlCluUUIlalH 


ivnup 1 


ivi.mubcuiub ivi i K^r-i gene. 


1 ftlftAI 




1^0 I y\JQ 




Rbmx 


RNA binding motif protein, X chromosome 


y /o4o 


ivim.Zoz /j 


V<iyKU YU 


1>I UV/ICal 


Snrpa 


small nuclear ribonucleoprotein polypeptide 
A 


1 ftft 1 ft 1 
i UU I U 1 


ivim.'fOj j 


v,?oz i ©y 


Secreted 


lap 


intracisternal A particles 


97181 f 


Mm.212712 


P03975 




1 HZ 


Mus musculus spasmolytic polypeptide 
(mSP) gene, complete cds. 


VJJUZ 




yUJ4U4 


Sitznaline 


Gnb4 


guanine nucleotide binding protein, beta 4 


93949 


Mm 9136 


P29387 


Signaling 


Tsc2 


tuberous sclerosis 2 


97953 g 


Mm.30435 


Q61037 


Structural 


Fscnl 


fascin homolog 1 , actin bundling protein 
(Strongylocentrotus) purpuratus) 


92838 


Mm. 13 194 


Q6I553 


Transcription ! 


Irfl 


Interferon regulatory factor 1 


102401 1 


Mm. 1 246 


P15314 


Transcription 


Cited2 


Cbp/p300-interacting transact ivator, with 
Glu/Asp-rich carboxy-terminal domain, 2 


101973 


Mm.9524 


035740 


Transcription 


Ncorl 


nuclear receptor co-repressor 1 


101536 


Mm.8806I 


060974 


Transcription 


Sox6 


SRY-box containing gene 6 


92726 


Mm.4656 


P40645 


Transcription 


Hhex 


Mus musculus Hex(Prh) gene, exon4 and 
complete cds. 


98408 


Mm.33896 


Q9R1X2 


Transcription 


Trim30 


tripartite motif protein 30 


98030 


Mm.3288 


PI 5533 


Transcription 


Tieg 


TGFB inducible early growth response 


99602 


Mm.4292 


089091 


Transcription 


Klf2 


Kruppel-like factor 2 (lung) 


96109 


Mm.26938 


060843 


Transcription 


Eif4a2 


eukaryotic translation initiation factor 4A2 


93089 


Mm. 16323 


PI 0630 


Transcription 


H2a-6I5 


Mus musculus histone H2a.2-615 (H2a- 
615), and histone H3.2-615 (H3-615) • 
genes, complete cds. 


93068_r 




P20670 


Transcription 


Nfe212 


Mus musculus p45 NF-E2 related factor 2 
(NRF2) gene, exon 2 to exon 5 and 


92562 




Q60795 
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complete cds. 








Transcription 


nil 


Friend leukemia integration 1 


94698 


Mm. 119781 


P26323 


Transcription 


McmdS 


mini chromosome maintenance deficient 5 
(S. cerevisiae) 


100156 


Mm.5048 


P49718 


Transcription 


H3Db 


H3 histone, family 3B 


100708 


Mm.18516 


P0635! 


Transcription 


Rev3I 


REV3-like, catalytic subunit of DN A 
polymerase zeta RAD54 like (S. cerevisiae) 


103457 


Mm.2167 


Q61493 


Transcription 


Hoxb5 


homeo box B5 


103666 


Mm.207 


P09079 


Transcription 


Pbxl 


pre B-cell leukemia transcription factor 1 


94804 


Mm.22I246 


P41778 


Transcription 


Zfp3611 


zinc finger protein 36, C3H type-like 1 


93324 


Mm.18571 


P23950 


Transcription 


Myb 


myeloblastosis oncogene 


92644 s 


Mm. 1 202 


P06876 


Transcription 


Sp4 


trans-acting transcription factor 4 


92992 i 


Mm.5073 




Transcription 


Idb2 


Mus musculus helix-loop-helix protein Id2 
gene, 3' region. 


93013 






Transmembrane 


Hiatl 


hippocampus abundant gene transcript 1 


160447 


Mm.3792 


P70187 


Transmembrane 


Igh-4 


mouse gene for the constant part of gam ma - 
1 immunogloblin. 


101870 




P01869 


Transmembrane 


li 


la-associated invariant chain 


101054 


Mm.7043 


P04441 


Transmembrane 


H2-Aa 


histocompatibility 2, class II antigen A, 
alpha 


92866 


Mm.175310 


P23150 


Transmembrane 


Epor 


Mouse gene for erythropoietin receptor. 


103997 




PI4753 


Transmembrane 


Irs2 


Mus musculus insulin receptor substrate-2 
(Irs2) gene, partial cds. 


92205 




088970 


Transmembrane 


H2-Ebl 


histocompatibility 2, class 11 antigen E beta 


94285 


Mm.22564 


061857 


Transmembrane 


Tnfrsfl7 


tumor necrosis factor receptor superfamily, 
member 17 


94190 


Mm. 12935 


088472 


Transmembrane 


Adcy9 


adenylate cyclase 9 


92527 


Mm.4294 


P51830 


Transmembrane 


Edgl 


endothelial differentiation sphingolipid G- 
protein-coupled receptor 1 


161788_f 


Mm.982 




Transmembrane 


Fzd4 


frizzled homolog 4 (Drosophila) 


93459 s 


Mm.68712 




Transport 


Vps35 


vacuolar protein sorting 35 


92640 


Mm. 196201 


Q9EQH3 


Transport 


Hbb-b2 


Mouse gene for beta-l-globin. 


103534 




P02089 


Transport 


Kpnbl 


karyopherin (importin) beta 1 


93111 


Mm.I67lO 


P70168 


Transport 


Rab9 


RAB9, member RAS oncogene family 


95516 


Mm.25306 


O9R0M6 


Transport 


Racl 


RAS-related C3 botulinum substrate 1 


101555 


Mm.889 


P15154 


Transport 


Rab33b 


Mus musculus DNA for Rab33B, exon 2 
and complete cds. 


103062 




035963 


Zinc Finger 


Zfp216 


zinc finger protein 2 1 6 


160321 


Mm.2904 


088878 


Zinc Finger 


Rnfll 


ring finger protein 1 1 


160205 f 


Mm.25228 


09QYK7 


Zinc Finger 


Nbrl 


next to the Brcal 


101484 


Mm.784 


P97432 


Zinc Finger 


pol 


Mus musculus clone MIA 14 full-length 
intracisternal A-particle gag protein gene, 
complete cds; and pol pseudogene, 
complete sequence. 


93907_f 




PI 1365 


Zinc Finger 


Gfilb 


growth factor independent 1 B 


102260 


Mm. 10804 


070237 1 


Zinc Finger 


Carl 


carbonic anhydrase 1 


98098 


Mm.3471 


PI 3634 




Cul4a 


cullin 4A 


104288 


Mm.22276 






D7Wsul28e 


DNA segment, Chr 7, Wayne State 
University 128, expressed 


103861_s 


Mm.21103 






Rhced 


Rhesus blood group CE and D 


103340 


Mm. 195461 


O9QX04 




AU044919 


expressed sequence AU044919 


102823 


Mm. 14438 






Ifii 


immunoglobulin joining chain 


102372 


Mm. 1192 






Lisch7 


liver-specific bHLH-Zip transcription factor 


162274 f 


Mm.4067 






lgh-VJ558 


immunoglobulin heavy chain, (J558 family) 


161486 f 


Mm. 157783 






0910001L24 
Rik 


RIKEN cDNA 0910001 L24 gene 


161243_f 


Mm.22637 






Txnip 


thioredoxin interacting protein 


160547 s 


Mm.77432 






Drl 


down-regulator of transcription 1 


160449 


Mm.38184 






4933429H19 
Rik 


Mus musculus, Similar to translocation 
protein 1, clone 1MAGE:5347105, mRNA, 
partial cds 


160136_r 


Mm.200980 






1500010B24 
Rik 


RIKENcDNA 1 5000 10 B24 gene 


160111 


Mm.65264 








Mus castaneus IgK chain gene, C-region, 3' 


102156 f 
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end. 










AA409749 


expressed sequence AA409749 


100742 


Mm.3628 






D2Ertd63e 


DNA segment, Chr 2, ERATO Doi 63, 
expressed 


95862 


Mm.24965 






lgk-V28 


Mus muse ul us anti-HIV-1 reverse 
transcriptase single-chain variable fragment 
mRNA, complete cds 


100322 


Mm.220154 






583043 IA10 
Rik 


RIKEN cDNA 583043 1A10 gene 


94136 


Mm. 1148 






Igl-Vl 


Mouse Ig active lambda- 1 -chain C-region 
gene, 3' end. 


93638_s 








Imap38 


immunity-associated protein, 38 kDa 


92489 


Mm. 197478 


P70224 




92316_f 


Mouse germline lg lambda-2 -chain C- 
region gene, 3' end. 


923l6_f 








2700007P21 
Rik 


RIKEN cDNA 2700007P21 gene 


92268 


Mm.3587 






104477 


ESTs 


104477 


Mm.29940 






06 100 12 AOS 
Rik 


RIKEN cDNA 0610012A05 gene 


104206 


Mm.27619 






Atp6s 1 


Mus musculus, clone MGC:376l 5 
IMAGE:4989784, mRNA, complete cds 


103699 i 


Mm.222723 






Gbp3 


guanylate nucleotide binding protein 3 


103202 


Mm. 1909 






immunoglob 
ulin V region 


Mouse mRNA for immunoglobulin gamma- 
3 V-D-J region and secreted constant 
region, complete cds. 


102721 








AI256744 


Mus musculus, clone IMAGE:3500612, 
mRNA, partial cds 


102233 


Mm. 1043 






Ptdssl 


phosphatidylserine synthase 1 


101931 


Mm.9440 


055024 




Ggal 


golgi associated, gamma adaptin ear 
containing, ARF binding protein 1 


98445 


Mm.34525 






4121402D02 
Rik 


RIKEN cDNA 4121402D02 gene 


97935 


Mm.30252 






ligp 


interferon-inducible GTPase 


96764 


Mm.29008 


Q9Z1M3 




2310022K15 
Rik 


RIKEN cDNA 2310022K15 gene 


95622 


Mm.28047 






Vcl 


vinculin 


94963 


Mm. 12842 






2610319K07 
Rik 


RIKEN cDNA 26103 I9K07 gene 


104744 


Mm.200479 






lea 


Mouse Ig germline D-J-C region alpha gene 
and secreted tail; Mouse germ line gene for 
immunoglobulin alpha H constant part 
(coding for the last three exons) 


100583 








Prpf8 


pre-mRNA processing factor 8 


98574 


Mm.3757 






Scotin 


scotin gene 


95102 


Mm.196533 






1 1 10035L05 
Rik 


RIKEN cDNA 1 1 10035L05 gene 


95052 


Mm.29140 






3110001 A13 
Rik 


RIKEN c DNA 31 10001 Al 3 gene 


96640 


Mm.200627 






Vps26 


vacuolar protein sorting 26 (yeast) 


96665 


Mm.27373 






mu- 

immunoglob 
ulin 


Mouse germ line gene fragment for mu- 
immunoglobulin C-terminus (secreted 
form). 


93583_s 








HI9 


M. musculus H19 mRNA. 


93028 




061638 




Car2 


carbonic anhydrase 2 


92642 


Mm. 1186 






Rael 


RAE1 RNA export 1 homolog (S. pombe) 


160466 


Mm.4113 






Map 1 lc3 


microtubule-associated protein 1 light chain 
3 


160288 


Mm.28357 






1 700008C22 
Rik 


RIKEN cDNA 1700008C22 gene 


160123 


Mm. 177990 






98254_f 


un98f06.xl NCI_CGAP_Mam6 Mus 
musculus cDNA clone IMAGE:258 1955 3' 
similar to gb:M 10062 Mouse IgE-binding 
factor mRNA, complete cds (MOUSE); 
mRNA sequence. 


98254_f 








Eef2 


eukaryotic translation elongation factor 2 


97559 


Mm.27818 


06 1509 
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Igk-V28 


immunoglobulin kappa chain variable 28 
(V28) 


99405 


Mm. 104747 






9030022E12 
Rik 


RIICEN cDNA 9030022E12 gene 


104198 


Mm.27519 






D 18362 


expressed sequence D 18362 


103206 


Mm.205433 






Heyl 


Mus muse ul us 6 days neonate head cDNA, 
RIICEN full-length enriched library, 
clone:5430408Kl 1 :hairy/enhancer-of-split 
related with YRPW motif 1 , full insert 
sequence 


101913 


Mm.222825 






shrm 


shroom 


100024 


Mm.46014 






AW547365 


expressed sequence AW547365 


97425 


Mm.30015 






D8Ertd69e 


DNA segment, Chr 8, ERATO Doi 69, 
expressed 


94922_i 


Mm.26609 






Frapl 


FK506 binding protein 12-rapamycin 
associated protein 1 


104708 


Mm.21158 






4933434E20 
Rik 


RIICEN cDNA 4933434E20 gene 


104038 


Mm.21451 






1810009A16 
Rik 


RIICEN cDNA 18 10009A16 gene 


104041 


Mm.21458 






Pexlla 


peroxisomal biogenesis factor 1 la 


103660 


Mm.206I5 


Q9Z211 




AU044919 


expressed sequence AU044919 


102824 g 


Mm. 14438 






MGC29044 


hypothetical protein MGC29044 


102375 


Mm. 1196 






Mkml 


makorin, ring ringer protein, 1 


101070 


Mm.7198 






LOC207933 


similar to Isopentenyl -diphosphate delta- 
isomerase (IPP isomerase) (Isopentenyl 
pyrophosphate isomerase) 


96269 


Mm.29847 






Elp3 


elongation protein 3 homolog (S. 
cerevisiael 


95717 


Mm.29719 






Addl 


adducin 1 (alpha) 


94535 


Mm.29052 






Pbef 


pre-B-cell colony-enhancing factor 


94461 


Mm.28830 






4930588A18 
Rik 


Mus musculus, clone IMAGE:4457493, 
mRNA 


96717 


Mm.233830 






Dadl 


Mus musculus Defender against Apoptotic 
Death (Dadl ) gene, exon 3. 


96008 








2410015A15 
Rik 


RIICEN cDNA 241001 5 A 15 gene 


95433 


Mm.24495 






Xbpl 


X-box binding protein 1 


94821 


Mm.22718 






Netl 


neuroepithelial cell transforming gene 1 


94223 


Mm.22261 


Q9ZIL7 




lgk-V28 


immunoglobulin kappa chain variable 28 
(V28) 


93086 


Mm. 104747 






LOC2 18490 


similar to Transcription factor BTF3 (RNA 
polymerase B transcription factor 3) 


93057 


Mm. 1538 






Lame I 


laminin, gamma 1 


161706 f 


Mm. 1249 






AI450287 


expressed sequence A 14 5 02 8 7 


161596 f 


Mm.222827 






SeplS 


1 5-kDa selenoprotein 


160360 


Mm.29812 






LOC229906 


similar to TRANSCRIPTION INITIATION 
FACTOR MB (TFIIB) (RNA 
POLYMERASE II ALPHA INITIATION 
FACTOR) 


160225 


Mm.27213 






2810043003 
Rik 


RIICEN cDNA 2810043003 gene 


98756 


Mm.45532 






96532 


ESTs, Highly similar to nucleolar protein 
GU2 [Mus musculusl TM. musculus] 


96532 


Mm.35019 






Mytll 


myelin transcription factor 1 -like 


96495 


Mm.2523 


P97500 




201 0004 A03 
Rik 


RIICEN cDNA 201 0004 A03 gene 


94802 


Mm.35302 






C79248 


expressed sequence C79248 


94689 


Mm.153895 






Mylk 


myosin, light polypeptide kinase 


93482 


Mm.27680 






DlErtdl47e 


DNA segment, Chr I, ERATO Doi 147, 
expressed 


93191 


Mm.5572 






R75364 


expressed sequence R75364 


92397 


Mm.89393 






92245 


ESTs, Highly similar to nucleolar protein 
GU2 [Mus musculus] [M. musculus] 


92245 


Mm.35019 
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Ctse 


Mus muse ul us cathepsin E gene, exon 1 , 
partial. 


104696 








AA420392 


expressed sequence AA420392 


104670 


Mm.32357 






Acyp2 


acylphosphatase 2, muscle type 


104258 


Mm.28407 






Lrba 


LPS-responsive beige-like anchor 


104264 


Mm.28458 






Dock2 


dedicator of cyto-kinesis 2 


103462 


Mm.2173 






Gabpa 


GA repeat binding protein, alpha 


103440 


Mm. 18974 






Nripl 


nuclear receptor interacting protein 1 


103288 


Mm.20895 


09Z2K2 




AI225904 


expressed sequence AI225904 


103200 


Mm. 1902 






98438 f 


Mouse Q4 class 1 MHC gene (exon 5). 


98438 f 




031220 




2010012D11 
Rik 


RIKEN cDNA 20 10012D11 gene 


96231 


Mm. 140243 






AUO 19574 


Mus musculus, Similar to hypothetical 
protein FU 1 1 1 1 0, clone MGC: 1 1 734 
IMAGE:3968418, mRNA, complete cds 


96172 


Mm.28395 






9130415E20 
Rik 


RIKEN cDNA 9130415E20 gene 


95020 


Mm.40620 






95021 


Mus musculus, clone IMAGE:4502890, 
mRNA 


95021 


Mm.27476 






AW495846 


expressed sequence AW495846 


104549 


Mm.23702 






Gtpbp2 


GTP binding protein 2 


104144 


Mm.22147 






2310050NI1 
Rik 


RlKENcDNA2310050Nll gene 


104114 


Mm.21954 






Ormdl3 


ORM 1-like 3 (S. cerevisiae) 


98065 


Mm. 180546 






261 0003 J05 
Rik 


RIKEN cDNA 261 0003 J05 gene 


97491 


Mm.31051 






Map 17 


membrane-associated protein 1 7 


96935 


Mm.3018t 






Gabarapl2 


GABA(A) receptor-associated protein like 2 


96840 


Mm.30017 






2310050K10 
Rik 


RIKEN cDNA 23 10050K10 gene 


95743 


Mm.29769 






All 82287 


expressed sequence AI 1 82287 


94469 


Mm.28848 






Nudel 


nuclear distribution gene E-like 


98884 r 


Mm.31979 






Cpnel 


copine I 


97199 


Mm.27660 






Dnajb9 


DnaJ (Hsp40) homolog, subfamily B, 
member 9 


96680 


Mm.27432 






95488 


Mus musculus, clone IMAGE:3597827, 
mRNA, partial cds 


95488 


Mm.25018 






2700059C12 
Rik 


RIKEN cDNA 2700059C12 gene 


93312 


Mm.18485 






Sdcbp 


syndecan binding protein 


93017 


Mm.14744 


088601 
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Table 7. Tansmembrane Proteins Enriched in Mouse HSCs 



Classification 


Description 


surface antigen 


Histocompatibility 2 y class I! antigen E beta 


receptor 


Gamma-aminobutyric acid (GABA) B receptor, 1 


oncogene 


Myeloproliferative leukemia virus oncogene (TPOR) 


surface antigen 


Histocompatibility 2, class II antigen A alpha 




Cytotoxic T lymphocyte-associated protein 2 beta 


receptor 


Erythropoietin receptor 


oncogene 


Kit oncogene 




Coagulation factor II (thrombin) receptor 




Frizzled homolog 4 (Drosophila) 




Membrane-associated protein 1 7 


surface glycoprotein 


ESTs similar to C21 l_Human putative surface glycoprotein 



Table 8. Transcription Factors Upregulated in Mouse HSCs 



Symbol 


Description 


Fold change 


Accession No. 


Klf2 


Kruppel-like factor 2 (lung) 


44.9 


NM 008452 


Nmycl 


neuroblastoma myc -related 
oncogene 1 


10.4 


NM 008709 


Zfxlha 


zinc finger homeobox 1 a 


10.4 


NM 011546 


Gata3 


GATA-binding protein 3 


9.0 


NM 008091 


Tcfl5 


transcription factor 1 5 


8.6 


NM 009328 


Tall 


T-cell acute lymphocytic 
leukemia 1 


S3 


NM 011527 


HoxbS 


homeo box B5 


7.2 


NM 008268 


Meisl 


myeloid ecotropic viral 
integration site 1 


7.1 


NM 010789 


Pbx3b 


Mus musculus transcription 
factor PBX3b 


6.5 


AF020200 


Cited2 


Cbp/p300-interacttng 
transactivator 2 


5.2 


NM 010828 1 


Atf2 


activating transcription factor 2 


3.6 


none 


Pbxl 


pre B-cell leukemia 
transcription factor 1 


4.7 


NM 008783 


None 


chromatin remodeling factor 


4.5 


Mm.24637 


None 


EST similar to PRE-MRNA 
SPLICING FACTOR SRP20 


3.4 


Mm.29915 


BtO 


basic transcription factor 3 


3.2 


none 


Ten 2 


transcription factor 12 


2.7 


NM 011544 


Madh7 


MAD homolog 7 (Drosophila) 


2.7 


NM 008543 


Hhex 


hematopoietically expressed 
homeobox 


2.5 


NM 008245 
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Example 5. Hierarchical Clustering Analysis of Differential Expressed Genes 

This Example describes study aimed at determining if genes differentially 
expressed with the HSC compartment are also expressed in other tissues. To perform this 
analysis we compared the gene expression levels of 210 differentially expressed HSC genes 
with a database composed of 45 normal tissue. Hierarchical clustering of these data was 
used to group both those tissues and genes with similar expression patterns. The three HSC 
cell subsets formed a distinct branch in this analysis, with LTR-enriched 38 + 34" cells 
forming a discrete branch compared to the STR cells (38 + 34 + and 38"34 + ). This clustering 
pattern is consistent with the stem cell activity pattern within the three subsets. Importantly, 
the HSC samples do not cluster near the bone or bone marrow samples suggesting that the 
differentially expressed HSC genes are not bone marrow related. This analysis also showed 
that the majority of these genes were not ubiquitously expressed although most were 
expressed at comparable levels in at least one other tissue. 

Three of the genes were found to have their peak expression within the HSC 
compartment. These were the scaffolding protein Gabl (GRB2-asssociated binding protein 
1) and the uncharacterized gene A430017F18 which displayed the highest level expression 
in the LTR enriched CD38 + CD34" cells, and the Pdgfrb gene (platelet derived growth factor 
receptor, beta polypeptide) which peaked within the 38 + 34 + STR HSC subset. Although the 
majority of these genes are also expressed at comparable levels in other tissues it is 
important to note that in many cases the level of expression in HSC subsets was at or near 
the peak expression determined for these genes across the entire 45 tissue panel. The high 
relative expression within HSCs of this subset of genes indicates that they likely to play an 
important role in the biology of HSCs. 

*** 

It is understood that the examples and embodiments described herein are for 
illustrative purposes only and that various modifications or changes in light thereof will be 
suggested to persons skilled in the art and are to be included within the spirit and purview of 
this application and scope of the appended claims. Although any methods and materials 
similar or equivalent to those described herein can be used in the practice or testing of the 
present invention, the preferred methods and materials are described. 
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All publications, GenBank sequences, patents and patent applications cited 
herein are hereby expressly incorporated by reference in their entirety and for all purposes 
if each is individually so denoted. 
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